# Firefox's Optimized Zip Format: Reading Zip Files Really Quickly

I recently saw an article about a new format that was faster than zip.

This is quite surprising as to my mind, zip is one of the most flexible and low-overhead formats I’ve encountered.

Some googling showed me that over past 11 years people have noticed that Firefox uses optimized zip files. This inspired me to document thinking behind the optimized zip format I implemented in Firefox in the pre-pandemic 2010. I had a lot of fun writing this code, was surprised that I failed to blog about it.

### Zip format

The following diagram is borrowed from codeproject article. Wikipedia and ZIP specification are also helpful.

Zip files seem to be designed for cheap appends. For every file inside a zip, there is a Local Header + optionally compressed file # contents.

This is followed by a Central Directory which acts as an index for zip contents.

To extract a file from zip file one must:

1. Scan backwards through zip file for End of Central Directory marker.
2. Read offset to begining of Central Directory
3. Find relevant Local Header offset in Central Directory index
4. Read + optionally decompress stored file.

### Writing Optimized Zip Files

In order to optimize file IO on Firefox startup I wanted to make use of OS readahead1.

The following creative interpretation of Zip spec results in optimized zip files:

1. Since we already do PGO2 for Firefox builds, I added a ZipArchiveLogger for logging zip-entries being accessed to the Firefox profiling stage.

2. Then during the build phase, I added optimizejars.py3 to move the Central Directory

3. Additionally optimizejars.py would lay out zip entries in order specified by ZipArchiveLogger log.

4. Wrote down length of Central Directory + entries in step 3.

Zip file now looks like: | 4 bytes | Central Directory | Hot Files | Cold Files | End Of Central Directory Marker |.

Thus we have a sequentual-read-friendly zip file that can still be ready by zip tools that follow the spec.

1. Speculatively check if we can find the Central Directory signature 4 bytes in.