All About Performance

and other stuff by Taras Glek

Mozilla Tech on Reddit

I posted earlier about planet mozilla being obsolete. To move forward, we need to experiment with some alternatives. Mozilla contributor, @djco, setup a reddit feed so we can up-vote technical content and down-vote the rest.

I know a number of good people who stopped reading and posting on planet because of irrelevant, offensive, etc content. Please give the reddit feed a try. Maybe once we get enough people moderating, we’ll encourage more quality content.

I am not suggesting that people stop posting pictures of their cats, dictionary entries on planet. I only want to filter that out so I can have a satisfying technical feed.

Please try out the reddit feed and let me and Dirkjan know if it works for you. We are open to other suggestions too.

Snappy #55: Snappy Evolution

The Snappy name will be retired in favor of Performance: we will be expanding our meta scope beyond desktop Firefox responsiveness.

I think as a result of Snappy, Firefox is in a much happier performance place now. There are a big wins remaining, but we have tools and ideas on how to get there. We’ve come a long way from “how to make Firefox feel fast?” discussions in late 2011.

Snappy has been a tricky meta-project because work has to happen across teams. While my Performance team has an exclusive commitment to performance, other teams have to context-switch between feature-work, platform-work, performance, etc. There are also team culture differences on internal vs external communication, planning, etc.

Some of the projects are quite hard and require specialized expertise. This meant that we’d make a bunch of progress on a project (eg breakpad profiler backend) only to realize that we are blocked on someone we didn’t loop into the project right away (eg Ted). Ideally we will minimize chances of surprises like this in the future.

Until recently my solution was to deal with this as a manager. I’d sync up with relevant managers about Snappy needs and try to get some Snappy bugs onto relevant team’s todo. This has worked out ok, but there were a few too many instances of unexpected delays due to shifting team priorities, miscommunication, general lag. My conclusion is that at a developer-driven organization like Mozilla heavy manager involvement on Snappy-type projects is a sign that we are doing it wrong. Developers should be adding performance goals to their team’s agenda.

Last week we came up with a more developer-centric way to do performance work. I’ll still be around making sure things are moving forward, but from now one I’d like to see developers drive planning & coordination on a per-project basis. I’ll introduce individual projects + their respective blogs as they start in the next couple of weeks.

Irving Reid (addon-manager startup performance lead) summed up key mechanics of the new approach in an email. Thanks, Irving.

Irving’s Summary

Coming out of the Snappy work week, we decided to try and make performance projects a little more structured than they have been in the past; see Lawrence’s post for a summary.

While not mentioned in Lawrence’s blog, one of the things Taras (if I recall correctly) suggested was to have a project kick-off meeting to get rough agreement on scope, responsibilities, and how we’re going to track progress. If everyone is happy with settling those questions over email, we can; I think we lose a little by not getting a chance to see and hear each other directly, but it is rather painful to get that scheduled in the current circumstances.

In any case I wasn’t planning on delaying work until after the meeting could happen; it’s more about getting things as clear as we can, as early as we can.

The current plan is for Felipe and I to do the patches, with me handling coordination and progress tracking as well. Everybody else is involved to make sure we’re going in the right direction, and help us over roadblocks.

The progress tracking technique the Snappy team settled on last week was a combination of daily one-line updates using the status bot in the #perf channel, and a blog post every two weeks summarizing overall progress. I’ll do the blog posts; depending on how things go I may do them weekly instead of biweekly.

Snappy #54: Snappy Discussion in Paris

Meeting In Paris

Last week Snappy people met in Paris at IRILL. I like small venues because they encourage conversations. IRILL is one of the best small work venues I’ve attended.

Excellent Parisian food did not hurt :)

Workweek Presentations

I was impressed by the number and quality of presentations this time. I suspect having a presentation-friendly venue helped.

Talks/discussions/notes (please leave a comment if I missed a talk):

Team Activity

For a team activity we walked to the Paris Catacombes.

Snappy #53: Faster: Startup, Image Decoding, Touchpad Input. Smoother Animations

Responsiveness

Joe Drew taught Firefox to decode images on multiple threads. It took a mere 29 patches in bug 716140. This should speed up page-load and improve tab-switch times. This task was considered too hard a year ago when Snappy people were discussing potential improvements.

Masayuki Nakano improved Firefox scrolling responsiveness on modern touchpads in bug 829952. Dealing with scroll-events on Windows is a mess. It’s nice when we make forward progress in this area.

Marco Bonardo fixed a mysterious cause of main thread IO I ran into in bug 830423. I ran into this issue because I compulsively navigate to about:telemetry in Firefox and look in ‘Slow SQL Statements’ and ‘Browser Hang’ sections. I encourage readers of this blog to check out that data whenever Firefox is under-performing.

Startup

bug 810151 + bug 810454 - Aaron Klotz implemented omnijar + cookie readahead.

bug 648407 - Mike Hommey folded libraries for faster startup. If I’m reading bug 852068 correctly, Firefox now loads 7 fewer libraries on startup. My rough rule of thumb is that each (small) file adds ~30ms to spinning-disk startup so this should net >200ms in startup savings.

Cumulative startup improvements are notoriously difficult to predict + measure, but I suspect that above changes should make for a >=10% speedup in Firefox 22 start over previous releases. We’ll be watching telemetry data in the coming weeks.

Smoothness

bug 590422 - Avi Halachmi is continuing on his quest to make Firefox animate smoothly. This is another tricky step towards smoother animations in Firefox. Since landing this, Avi already embarked on the next gecko-level animation smoothness improvement.

Marco Bonardo spotted some potential for contention in recent DOM Local Storage optimizations. Vladan Djeric landed corrections in bug 842852.

Throughput improvements

Ehsan Akhgari reduced allocator contention in bug 733277.

Tim Taubert taught Firefox to warm up newtab connections on hover bug 790882

Blogging With Octopress

A couple months ago I switched to Octopress. I now have some experience to share.

Overall, my Octopress + Github + Disqus experience has been much better blogger, wordpress and livejournal past before it. I only wish I kept my old posts in HTML instead of converting them to markdown. When I was setting up my blog, I did not know that Octopress could render HTML.

RSS Bugs

I learned not expect much from RSS. In addition to being hard to discover, the default category RSS is buggy. It interprets markdown twice choking on some exciting Telemetry links in my archives. I had to restort to writing a custom mozilla category feed.

The excerpt feature (<!-- more -->) does not work in RSS. To me that defeats the whole point of excerpts, who even reads blog homepages? I do not have time to fix this.

It’s easy to make a mess

I wanted to get rid of some octopress defaults like external CSS, external fonts, modify some layout. I found I could not restrict myself to only editing files in _directories. I think this means that switching to a new theme will be hard. I was lazy and ended up with my content in the same repo as the octopress source. I’ll have to clean up my act before I can share my customizations.

Saving Time

One of the worst parts about my old wordpress blog was the amount of UI one had to go through to create HTML links. My snappy updates have a lot of bugzilla links. I wrote an octopress extension to do most of the link work for me. Syntax looks like:

1
{%Bug ####%}, {%bug ####%}

If anyone is interested, you can download it here until I clean up my octopress git repository.

Overall I like Octopress and I recommend the octopress/github combo to every developer who is looking to setup a blog. It saves a lot of time and as long as one can deal with lack of unit testing and Ruby, it’s great.

Snappy #52

Frontend

Help-Wanted:

Avi Halachmi needs your help comparing scrolling behavior between browsers.

Australis

Mike Conley blogged that Australis performance is now on-par with current theme on low-end hardware.

Startup

Aaron Klotz landed bug 845907. This gives us a consistent way to warm IO caches. This functionality can easily backfire if we end preloading data that does not get used. Uses of readahead should always be accompanied with telemetry to verify it performs as expected. Bug 810454 is the first user of readahead API, it landed with A/B testing telemetry. Omnijar readhead is next, in bug 810151. It results in ~60% drop in omni.ja startup read time on Windows on Aaron’s machine.

Snappy #51: Smoothing Tab Animations

Lack of Updates

I skipped a Snappy update two-weeks ago (did anyone notice?) due to not having any completed work to report. Snappy has not stagnated, we have big projects inflight see this week’s notes for some details.

Tab Smoothness

I usually do not cover in-flight work in Snappy updates and expect individual developers to blog about stuff they are working on. However, Avi Halachmi has delayed blogging to focus on quickly advancing Firefox performance, an exception had to me made.

Avi has been investigating tab smoothness since December. His approach relies on detailed instrumentation + sending captured data via Telemetry. This culminated in some exciting bug activity this week:

  • bug 828097 According to Telemetry, Firefox tab animations are quite smooth (due to recent improvements like bug 731974) iff one has the newtab thumbnail feature disabled (via button in top right of the page).
  • bug 843853 was filed to fix above performance hit ASAP.
  • bug 838758 20-25% tab animation speedup on Direct2D-accelerated systems.
  • bug 842967, bug 590422 improve animation scheduling.

Due to web-like Firefox UI architecture most of these improvements will enable smoother website perf.

Avi, Matthew Noorenberghe, Mike Conley are working on optimizing our next UI refresh: Australis. Australis is shaping up to be the most perf-tuned theme update we’ve done. See bug 837885 for how performance is being tracked.

As Avi’s manager I found it trying to see weeks of perf-reporting work with no fixes to accompany it. I’m happy to see this investigation investment pay off and serve an example of importance of methodically studying performance before proceeding to optimization.

Router-Assisted Automatic Wake-on-LAN + Suspend

The problem

My new computer uses 10-46W of power while on, 1W while suspended. It replaced my softbricked (ArchLinux can burn in hell) arm computer where power consumption ranged from 4-7W.

The new computer is a backup target for my eyefi (via iiid), Android phone (via rsync/ssh), laptop (via rsync).

For fun I decided to try to use suspend-to-RAM + Wake-on-LAN to minimize power usage to get this beefy x86 to use less power than the old ARM NAS (which could not unsuspend successfully).

By default WOL is a pain. The eye-fi card uploads photos when the SLR camera is on. Android phone is configured to rsync stuff when it is plugged while connected to home wifi. It’s not convenient to navigate some UI to send a WOL packet (or push a power button) every time I need to perform a backup. Manually deciding when to suspend the machine is also no fun.

Automatic WOL

In an ideal world WOL would support waking up on IP traffic directed at the machine in question. Typically WOL either wakes up in response to WOL magic packets or ANY ethernet traffic(ugh!). Update: Turns out my nic supports directed wakeups, but the e1000e driver on Linux does not implement them.

Luckily, ARP who has x.x.x.x queries are broadcast to whole network segment. So my Tomato router can help.

This works as follows:

1
2
3
Android-phone: ARP: Who has 192.168.1.149
router: *sends WOL packet to NAS*
Server Unsuspend Script: 192.168.1.149 is at A:B:C:D:E:F

I wrote an arpwol program for this, I run it via

1
arpwake.mips br0 192.168.1.149 4C:72:B9:42:EA:97

It listens for arp looking for 192.168.1.149 and fires of a WOL.

It takes 5 seconds for an ssh roundtrip to a sleeping server.

Auto-suspend

Once the server is awake, it is annoying to have to put it to sleep manually. I wrote a script to monitor /var/log/auth.log and auto-suspend when active ssh-session count reaches 0.

Conclusion

It great to have an x86 box idle at 1W while unused. Suspend to disk introduces extra latency without saving much more power (due to keeping WOL on).

Please checkout the code on github. WOL magic happens in arpwake.c and auto-suspend in autosleep.py.

10W Intel I5 Desktop

I recently built a new computer for HTPC, NAS, etc duties. Last couple of times this ended up in disappointment due to noise, hardware quirks, power consumption.

Most important components are Intel DQ77KB motherboard, Cooler Master TX3 cooler, i5 3330S ebay CPU. The motherboard takes a DC power jack and has onboard voltage regulation. It can be powered by a cheap DELL laptop power supply. No need for a big noisy ATX power supply. The CPU cooler is obnoxiously large and quiet. I can’t hear the computer even while compiling Mozilla.

This computer idles at as low as 10W, max observed power usage is 46W. Suspend-to-RAM uses around 1W. Thanks to the fancy Intel motherboard + CPU + efficient laptop PSU, at full power the computer uses less power than most systems do at idle.

It’s exciting that mighty desktop CPUs now use less power at idle that Atom CPUs from just 2 years ago. Too bad that Intel is the only retail vendor that optimizes their motherboards for power consumption. Especially too bad since they are quitting the motherboard market.

Plugin Hang UI on Aurora

Aaron wrote a great post on of the new plugin killer UI and Windows magic involved in debugging it.

We need help testing the new functionality, please see the link above for details.

Unfortunately, we are still waiting on bug 814095 to get his blog syndicated to planet.