All About Performance

and other stuff by Taras Glek

Icegrind - Valgrind Plugin for Optimizing Cold Startup

Most program binaries are laid out with little to no regard to how programs get loaded from disk. This disconnect between compile-time and runtime behaviour of binaries imposes a significant performance penalty to on large applications such as browsers, office suites, etc.

It is incredibly difficult to observe both the cause (ie calling a random function) of binary-induced IO and the effect (the program gets suspended during startup while parts being loaded from disk), so this area doesn’t get as much optimization love as it deserves.

My estimate is that around 50% of Firefox startup time is wasted on subobtimal binary layout. My previous post demonstrated the kind of difference a better binary layout can make. Note that reordering executables isn’t the only solution, eliminating dead code should also speed things up (deleting dead code is a hard).

Optimizing Binary Layout Disclaimer:I just finished my 3rd rewrite of icegrind a few hours ago, be gentle.

Ingredients: Valgrind SVN trunk + icegrind patch, GNU Gold + section-ordering-file patch, a way to describe contents of binaries.

Step 1a: Produce a build Since I am interested in reorganizing program binaries, I build mozilla with “-ffunction-sections -fdata-sections” in CFLAGS/CXXFLAGS

I also prelink the binaries in dist/bin such that my binaries better correspond to how they will be used: prelink $LD_LIBRARY_PATH/firefox-bin $LD_LIBRARY_PATH/*.so

Step 1b: Produce a description of interesting files I use my elflog utility to produce a .sections description of files I’m interested in. Elflog looks at the symbol table and tries to infer section names (produced by -ffunction-sections -fdata-sections) from symbol names/locations(see also –print-map option for ld).

elflog  –contents  libxul.so >  libxul.so.sections elflog currently emits non-existent .comment.* sections because it gets confused by 0-length sections such as .bss. Note, one can also build tools to describe other kinds of files, such as jar or sqlite files. The only limitation is that Icegrind currently only tracks mmap()-caused disk IO, it would be trivial to extend it to deal with open/seek/read kind of disk IO.

Step 2: Produce a log with icegrind! Apply my icegrind patch, build+install valgrind. Run Firefox valgrind –tool=icegrind firefox-bin -profile /tmp/ff -no-remote This will produce a .log file for every mmap()ed file with a .sections description. This log chronologically lists sections in the order of access.

Step 3: Tell gold to link using the above log Build/install binutils (I use a CVS checkout from a month ago) with the section ordering patch, specify –enable-gold. To reorder the binary, I just add -Wl,–section-ordering-file,libxul.so.log to my linker commandline. Note there are still some teething issues with using this patch, it exhibits N2 behavior (ie takes 10min to link libxul.so with it) and occasionally swaps order for .rela.plt and .rela.dyn, which makes prelink upset. But unlike my earlier attempt with linker scripts, it does not affect the binary size.

Step 4: Enjoy! Now strip, install, prelink your binaries and enjoy faster startup.

Plans

I would like to see the gold patch fixed up and landed. Once that is done I’d like to turn this on for our Linux and mobile linux builds.

I am hoping that some sort of sensible ordering of binaries will become commonplace in the future.

Comments