All About Performance

and other stuff by Taras Glek

Blogging With Octopress

A couple months ago I switched to Octopress. I now have some experience to share.

Overall, my Octopress + Github + Disqus experience has been much better blogger, wordpress and livejournal past before it. I only wish I kept my old posts in HTML instead of converting them to markdown. When I was setting up my blog, I did not know that Octopress could render HTML.

RSS Bugs

I learned not expect much from RSS. In addition to being hard to discover, the default category RSS is buggy. It interprets markdown twice choking on some exciting Telemetry links in my archives. I had to restort to writing a custom mozilla category feed.

The excerpt feature (<!-- more -->) does not work in RSS. To me that defeats the whole point of excerpts, who even reads blog homepages? I do not have time to fix this.

It’s easy to make a mess

I wanted to get rid of some octopress defaults like external CSS, external fonts, modify some layout. I found I could not restrict myself to only editing files in _directories. I think this means that switching to a new theme will be hard. I was lazy and ended up with my content in the same repo as the octopress source. I’ll have to clean up my act before I can share my customizations.

Saving Time

One of the worst parts about my old wordpress blog was the amount of UI one had to go through to create HTML links. My snappy updates have a lot of bugzilla links. I wrote an octopress extension to do most of the link work for me. Syntax looks like:

1
{%Bug ####%}, {%bug ####%}

If anyone is interested, you can download it here until I clean up my octopress git repository.

Overall I like Octopress and I recommend the octopress/github combo to every developer who is looking to setup a blog. It saves a lot of time and as long as one can deal with lack of unit testing and Ruby, it’s great.

Snappy #52

Frontend

Help-Wanted:

Avi Halachmi needs your help comparing scrolling behavior between browsers.

Australis

Mike Conley blogged that Australis performance is now on-par with current theme on low-end hardware.

Startup

Aaron Klotz landed bug 845907. This gives us a consistent way to warm IO caches. This functionality can easily backfire if we end preloading data that does not get used. Uses of readahead should always be accompanied with telemetry to verify it performs as expected. Bug 810454 is the first user of readahead API, it landed with A/B testing telemetry. Omnijar readhead is next, in bug 810151. It results in ~60% drop in omni.ja startup read time on Windows on Aaron’s machine.

Snappy #51: Smoothing Tab Animations

Lack of Updates

I skipped a Snappy update two-weeks ago (did anyone notice?) due to not having any completed work to report. Snappy has not stagnated, we have big projects inflight see this week’s notes for some details.

Tab Smoothness

I usually do not cover in-flight work in Snappy updates and expect individual developers to blog about stuff they are working on. However, Avi Halachmi has delayed blogging to focus on quickly advancing Firefox performance, an exception had to me made.

Avi has been investigating tab smoothness since December. His approach relies on detailed instrumentation + sending captured data via Telemetry. This culminated in some exciting bug activity this week:

  • bug 828097 According to Telemetry, Firefox tab animations are quite smooth (due to recent improvements like bug 731974) iff one has the newtab thumbnail feature disabled (via button in top right of the page).
  • bug 843853 was filed to fix above performance hit ASAP.
  • bug 838758 20-25% tab animation speedup on Direct2D-accelerated systems.
  • bug 842967, bug 590422 improve animation scheduling.

Due to web-like Firefox UI architecture most of these improvements will enable smoother website perf.

Avi, Matthew Noorenberghe, Mike Conley are working on optimizing our next UI refresh: Australis. Australis is shaping up to be the most perf-tuned theme update we’ve done. See bug 837885 for how performance is being tracked.

As Avi’s manager I found it trying to see weeks of perf-reporting work with no fixes to accompany it. I’m happy to see this investigation investment pay off and serve an example of importance of methodically studying performance before proceeding to optimization.

Router-Assisted Automatic Wake-on-LAN + Suspend

The problem

My new computer uses 10-46W of power while on, 1W while suspended. It replaced my softbricked (ArchLinux can burn in hell) arm computer where power consumption ranged from 4-7W.

The new computer is a backup target for my eyefi (via iiid), Android phone (via rsync/ssh), laptop (via rsync).

For fun I decided to try to use suspend-to-RAM + Wake-on-LAN to minimize power usage to get this beefy x86 to use less power than the old ARM NAS (which could not unsuspend successfully).

By default WOL is a pain. The eye-fi card uploads photos when the SLR camera is on. Android phone is configured to rsync stuff when it is plugged while connected to home wifi. It’s not convenient to navigate some UI to send a WOL packet (or push a power button) every time I need to perform a backup. Manually deciding when to suspend the machine is also no fun.

Automatic WOL

In an ideal world WOL would support waking up on IP traffic directed at the machine in question. Typically WOL either wakes up in response to WOL magic packets or ANY ethernet traffic(ugh!). Update: Turns out my nic supports directed wakeups, but the e1000e driver on Linux does not implement them.

Luckily, ARP who has x.x.x.x queries are broadcast to whole network segment. So my Tomato router can help.

This works as follows:

1
2
3
Android-phone: ARP: Who has 192.168.1.149
router: *sends WOL packet to NAS*
Server Unsuspend Script: 192.168.1.149 is at A:B:C:D:E:F

I wrote an arpwol program for this, I run it via

1
arpwake.mips br0 192.168.1.149 4C:72:B9:42:EA:97

It listens for arp looking for 192.168.1.149 and fires of a WOL.

It takes 5 seconds for an ssh roundtrip to a sleeping server.

Auto-suspend

Once the server is awake, it is annoying to have to put it to sleep manually. I wrote a script to monitor /var/log/auth.log and auto-suspend when active ssh-session count reaches 0.

Conclusion

It great to have an x86 box idle at 1W while unused. Suspend to disk introduces extra latency without saving much more power (due to keeping WOL on).

Please checkout the code on github. WOL magic happens in arpwake.c and auto-suspend in autosleep.py.

10W Intel I5 Desktop

I recently built a new computer for HTPC, NAS, etc duties. Last couple of times this ended up in disappointment due to noise, hardware quirks, power consumption.

Most important components are Intel DQ77KB motherboard, Cooler Master TX3 cooler, i5 3330S ebay CPU. The motherboard takes a DC power jack and has onboard voltage regulation. It can be powered by a cheap DELL laptop power supply. No need for a big noisy ATX power supply. The CPU cooler is obnoxiously large and quiet. I can’t hear the computer even while compiling Mozilla.

This computer idles at as low as 10W, max observed power usage is 46W. Suspend-to-RAM uses around 1W. Thanks to the fancy Intel motherboard + CPU + efficient laptop PSU, at full power the computer uses less power than most systems do at idle.

It’s exciting that mighty desktop CPUs now use less power at idle that Atom CPUs from just 2 years ago. Too bad that Intel is the only retail vendor that optimizes their motherboards for power consumption. Especially too bad since they are quitting the motherboard market.

Plugin Hang UI on Aurora

Aaron wrote a great post on of the new plugin killer UI and Windows magic involved in debugging it.

We need help testing the new functionality, please see the link above for details.

Unfortunately, we are still waiting on bug 814095 to get his blog syndicated to planet.

Is Planet Mozilla Obsolete for Technical Content?

Good Old Days

I have been remotely working at Mozilla for over 6 years. I like working remotely, but it poses some challenges. Early on I discovered that if I only show up at the HQ a couple times a year, most will people treat me as a stranger. That got old fast.

The problem is that it takes a lot of time time to get everybody up to speed on who you are (defined by what you work on). This means one’s work social circle is limited to people who you have frequent bugzilla/irc interactions with + random people who took the time to get to know a random coworker. One can imagine that introverts are not inclined to waste too much energy meeting new people.

The solution was simple: blog a lot. After a couple years of blogging I just had to say “I’m Taras” and a good proportion of the people would connect my face to (obscure static analysis at first) work they read about on planet. This cut down my introduction overhead significantly. Planet Mozilla had a lot of blogs syndicated to it when I joined. I had a huge audience to introduce my work to.

In addition to creating awareness of my work, blogging about tough problems would occasionally result in helpful comments. People provided tips on static analysis, Windows APIs and even ran scary privileged software I wrote to help me gather data. Due to disproportionate (eg saving days to weeks of work) value of helpful comments I concluded that it’s worth spending a couple hours per blog post. Most blog comments might be garbage, but they are easy to ignore. Before I implemented telemetry, I was able to find performance extremes solely on blog feedback. Unlike privacy-sensitive telemetry data, blog comments came with email addresses and eager volunteers on the other end. I value comments a lot, it makes me sad when good bloggers disable comments.

To me Planet Mozilla was a great way to keep up with Mozilla technical affairs. We have a lot of smart people working on interesting problems at Mozilla. As a result of past planet experience, I ask every new person who joins the Performance team to get their blog syndicated to planet ASAP. Increasingly that feels like an unproductive suggestion.

Present

I do not have any data on this. However my feeling is that the volume of blog traffic on planet grew from barely-manageable in the early days to too much. Good technical content never constituted more than 10% of the planet posts. However as absolute blog traffic grew, it became harder to spot the good stuff. In addition to a lot of content being non-technical, in the last few years people started discussing their feelings about others and things got ugly.

I’m pretty sure the result is that there are fewer technical people reading planet than before(due to poor signal/noise ratio). Lack of audience means less incentive to blog (that and the fact that some bloggers are part of the audience that gave up on planet).

So what are we to do? Is planet obsolete for good technical content? Is there a new reddit/hackernews/twitter self-moderating solution for dealing with signal problems? Surely setting up a new planet is no longer considered state of the art for this.

I am sad to see a public resource like the planet get too big to remain useful with no clear successor.

ps. Sorry for adding to the non-technical noise.

Snappy #50

Graphics

In some cases Direct2D-accelerated drawing is slower than the non-accelerated path. Jeff Muizelaar fixed a severe gradient ‘hang’ in bug 823147.

Avi Halachmi diagnosed a significant menu performance issue in bug 832641, this was promptly fixed by Matt Woodrow.

Misc Pauses

Vladan Djeric blogged about top main-thread SQL issues contributed by addons. Vladan also produced a chromehang report for last 2 months.

Ehsan Akhgari fixed a chromehang caused by leftover debug code: bug 830765.

Justin Lebar fixed an issue where telemetry memory reporting code was accidentally triggering expensive ‘release memory to OS’ operations: bug 789975.

Shutdown

Sometimes Firefox takes a long time to shutdown. We also have a timer that regularly triggers cycle collection. Olli Pettay disabled this timer during shutdown in bug 822849.

Snappy #48: Now With Faster Shutdown

Huge Shutdown Improvement

After a couple weeks worth of telemetry data confirmed that Olli Pettay sped up shutdown by an epic >=30%: bug 818739, telemetry link.

Memory Management

Olli and Andrew McCreight continued with reducing CC pauses:

  • bug 820378: Delay CC if we’re in the middle of a GC, to allow async CC prep
  • bug 827471: Remove more wrapped JS from the CC graph
  • bug 705371: Remove pointless JSContexts from the CC graph
  • bug 785493: Reduce size of steady state cycle collector graph by about 80%
  • bug 821371: Include prep work in cycle collector pause time telemetry

Misc

Vladan landed bug 807021. Firefox should now handle DOM Local Storage writes without janking.

Startup

David Teller made search service metadata loading/migration async: bug 760036. David also made session-store loading async: bug 532150.

Aaron Klotz landed a telemetry probe to measure how often the ‘Firefox is running but not responding’ dialog is encountered on attempted startup: bug 815418. This will help us decide on whether (or when) to add functionality to kill unresponsive Firefox instances.