2012 was an exciting year for Snappy. Turning ‘make it go faster’ into a set of measurements and corresponding bugs to fix was hard. We learned a lot.
I’d like to summarize some of the most memorable Snappy accomplishments.
Short version: Firefox is much more reponsive now.
Much of the year was spent on tooling for analyzing Firefox performance. Gecko profiler is my favourite new tool. Telemetry chromehangs, evolution and slowsql are also great for settling arguments.
Many of the bugs below would not have gotten fixed without these tools.
Cleaning up the Memory Collectors
In the beginning of the year it was common to suffer long (200-700ms for me) garbage and cycle collection pauses. I now rarely see pauses over 20ms in these areas. I suspect these were the most laborious improvements of 2012: see the huge number of blocking bugs in bug 641025, bug 716598.
SQLite is a fine database, but it is much better at storing data robustly than accessing it efficiently. Firefox got nailed by the following footguns:
- overusing main-thread SQL queries
- performing expensive background queries
As if main-thread IO wasn’t bad enough, turns out SQLite does not like running queries in parallel. Mixing sync/async queries invites race conditions where sync queries can end up waiting for many minutes at a time (hanging the UI) while “background” maintenance queries complete. There were too many such fixes to list.
DOM Local Storage Caching
We ran into significant problems with Local Storage. In order conform to spec and not perform poorly, browsers are forced to maintain an in-memory cache (this is why Local Storage dominates in synthetic benchmarks: they are measuring memory bandwidth with no syscall, etc overhead).
- It became really popular in 2011-2012 despite a poor spec: main-thread reads and vagueness on when/whether data should hit disk.
- Local Storage cache was causing LS to be written out too often: bug 714964
- Local Storage cache was written on main-thread. For paranoid amusement it was then read back in after every writeout: bug 807021
- For some some reason to do with how our DOM works there is a second level of caching so local storage can actually use up 2x more memory in RAM than it does on disk. As a result the Local Storage cache is slated for a complete rewrite in bug 600307.
I should note, I made a mistake and attributed too much blame to the Local Storage API. I will blog on the exact extent of Local Storage badness once I have a chance to access the relevant telemetry data.
Surprisingly, the Local Storage caching layer was so bad that the underlying SQLite footguns did not get to play a role in this tragedy.
For years people would argue on how patches should strive to use async APIs during patch review. Unfortunately even a little bit of sync IO has potential to cancel out the most elaborate async efforts.
We had no purely async storage APIs until recently. We now have one such API in OS.File.
The following sadness was fixed:
- Renaming directories with lots of files can take minutes on Windows: bad when it happens on startup: bug 701909.
- Firefox had a minor tendency to start loading webpages before UI is shown: bug 756313, bug 715402.
- Q: What could be worse than loading pages before UI is shown? A: Executing synchronous proxy code: bug 790370, bug 767159
- Firefox insisted on doing network activity to verify some extension jars on startup: bug 726125
- Tab switching to some popular websites is roughly 10x faster now (too many bugs to list).
- Firefox tended be unresponsive during large downloads: bug 789932.
- In some situations hardware acceleration would slow down Firefox UI to a crawl: too many bugs to list here.
2012 was a good warm-up. We spent a substantial part of the year on tooling. If everything goes right, that should pay off in the coming year.