In my previous post I made an argument that a modern phone is only as fast as the slowest component: ability of NAND to handle 4k writes. I decided to compare two Android flagships on the opposite ends of random-write-4k benchmark spectrum: Moto Z vs Google Pixel.

I wrote a little fio benchmark driver to fill all available device storage with random 4k writes, print perf stats along the way. Idea is to run the benchmark on /data/ partition, then fill all available space by writing to /storage/emulated/0, then do another round of testing on /data.

The chart above has p50 (50% IOs complete under X), p90 and p99 numbers for both devices. Moto Z median value is around 0.5ms, Pixel is 7x that at 3.3ms. Difference widens for p90.

On mobile phones 16.67ms is a magic number. That’s the amount of time one has to update screen at buttery-smooth 60FPS. Optimistically, one can roughly translate each data-persistence operation on Android into at-least 2 sequential random writes (best-case WAL SQLite mode). So if an app is saving a single piece of data, expect 6.6ms to be eaten up by IO on Pixel and when your device is busy, expect that number to rise quickly.

Note this is best-case performance for these devices, I expect performance to degrade as they age. Expect Pixel to drop frames or stutter as it ages. Pixel performs relatively poorly in this test.

How Motorola Smoked Google by ~10x at Storage Perf

I spent a few days poking around the filesystems while developing my benchmark experiment. Motorola (division of Lenovo) has bravely gone above and beyond stock Android to reduce storage lag. They got Moto-Z to performing close to high-end laptop SSDs.

How did Motorola do this? Answers were hiding in /proc/mounts file.

  • /storage/emulated/0: Google added a weird permission model for the common storage pool on Android. In a fit of either lazyness or rushing to meet some PM deadlines for features no users asked for: they wrote a passthrough fuse filesystem to enforce cross-app-file-sharing. This means that on the Pixel every user IO gets a round-trip back into user-space before hitting the NAND. Fuse burns more CPU and slows down IO by up to 30%. I love fuse for things like sshfs, but this is a terrible application of it. Motorola thought a little harder and replaced the nasty fuse hack with esdfs(fork of wrapfs).
  • /data: Pixel uses the traditional ext4 Linux filesystem. Moto-Z opted for f2fs. f2fs is a new filesystem developed by Samsung. It’s amazing, read the paper & watch preso. They drove development of the filesystem specifically by Twitter/FB/etc workloads captured from the phone. It does many neat things, but the thing it does best is avoid fsync write-amplification. F2FS flags fsyncs via block metadata instead of doing a full checkpoint. This means fsync requires 50%-less write operations than ext4 (interestingly competing filesystems like BTRFS have even higher fsync write amplification than ext4). I think the tradeoff is slightly slower recovery times. f2fs nets Moto-Z a 2x speed-up and 2x increase in NAND lifespan. Expect Moto-Z to age much better than Pixel.
  • nobarrier: Moto-Z has a very interesting mount option soup for mounting f2fs: rw,seclabel,nosuid,nodev,noatime,nodiratime,background_gc=on,discard,user_xattr,inline_xattr,acl,inline_data,nobarrier,extent_cache,active_logs=6. Just for kicks I took a USB hard-drive, formatted it with f2fs and applied same mount options. Suddenly the hard drive was 2x faster than the Pixel, WTF?

The key option is nobarrier. This effectively makes fsync() a no-op and explains most of the difference in performance. See XFS FAQ for the best description of nobarrier feature. This is where most of the performance difference comes from. Moto-Z is either awesome and implemented a RAM-cache solution for cellphones, or they are betting on excellent crash-recovery abilities of f2fs or they are really brave on behalf of users. Even if they didn’t implement battery-backed-RAM-cache for their NAND and that f2fs isn’t overly horrible at recovering from crashes this is probably still the right choice. As a user, I’m much happier to have a long-lasting phone that might forget a couple of seconds of data than a device that has to be trashed after a year of use.

If anyone has root on Pixel and Moto-Z, would be interesting to see if underlying block devices perform differently. I suspect they are very similar and that Motorola differentiates entirely in software.

Conclusion

Android OEMs like Motorola/Samsung (f2fs authors) are improving Android performance. Moto Z and a few other recent Androids have drastically reduced storage lag. Next time you are shopping, try to avoid buying devices that will slow down to point of being unusable as NAND wears out (ala Nexus 7, Nexus 6). I doubt anyone would spot the difference between a brand new Pixel and Moto-Z. However after a year of use, the difference should be stark.

Phone reviewers should be more vigilant and shame poorly-implemented devices. I won’t be recommending the Pixel to any family members.

I’m not recommending people buy Moto Z. WIFI/cell reception seems worse on Z than Pixel. Camera is worse too.

Comments/HackerNews

Comments/Reddit

Updates

  1. I’m confusing UI state transitions with UI animations. Android animation framework does not run on main thread. Disregard 16.6ms section

  2. In a follow-up twitter discussion, Android engineers made a solid case that this is likely a hack. If Motorola made nobarrier a no-op in hw, it wouldn’t be needed in sw (eg this email). It’s unclear how nobarrier was deemed safe. One could theorize that Motorola spent time QAing failure scenarios.

  3. I’m still hoping that an Android vendor will implement battery-backed-RAM-cache to solve the write-4k-bottleneck. Moto-Z can be considered a risky prototype of what storage performance should be like. Will be interesting to see if my prediction of Pixel aging worse than Z come true. I doubt write-4k is a bottleneck in any android workload on the Moto-Z.



TLDR: You can predict degree of unresponsiveness of a phone via random-write-4k benchmarks. I wish review websites would fill phones to 80-90% prior to running the benchmark, especially on smaller-capacity phones where users are more likely to run out of space.

SQLite vs Phone NAND

I’ve long held a theory that Android lag is almost directly determined by slowness induced by SQLite transactions. This weekend, while researching phones for a family member, I found some supporting evidence.

I was employed to analyze Firefox performance at Mozilla. Most of the time I focused on IO performance as my niche. This was relatively easy because desktop OSes (especially Windows with XPerf and Linux being open source) are very open to developers. Unfortunately as a hobbyist I have less chances to figure out why my phones are slow. None of my phones have root, let alone an unlocked bootloader (eg no ability to recompile the kernel with IO tracing functionality).

In the past I verified that all of my phones that got super-laggy were exhibiting single-digit-per-second write-random-4k benchmarks. However until now I couldn’t point at SQLite is the main driver of IO on Android.

To trace IOs on Android one has to recompile the kernel or at-least have root to run something like fsmon to observe high level IO. I was able to run fsmon on my rooted Android TV box and overserve that most of the IO occurred in SQLite databases.

For some reason Android does not default to using WAL journaling mode for SQLite which would make it use 2x-less IOPS.

Nested Journalling Magnifies Cost of SQLite IO

In addition to fsmon stats, I found a great paper on how SQLite accounts for 90% of Android IO and how it amplifies every write transaction by ~4x by the time it hits the underlying storage (eg 1commit ~= 4 fsyncs). It also shows how a 100 bytes of SQL data translates into 64KB of block writes.

Basic premise of the paper is that SQLite journaling is amplified by ext4 filesystem journal resulting in extreme badness. One is tempted to assume that it is further amplified by the GC on the EMMC NAND controller :)

I actually think the paper is overly optimistic in focusing on length of time taken by a single SQLite transaction. In reality one is likely to wait on more than one transaction due to having to update multiple databases or poorly written code (common problem with ORMs).

Combine above data with the fact that Phone NAND is the only component that gets consistently slower as your phone ages. Memory cells wear out and NAND garbage collector slows as the phone fills up to 80-90% of storage capacity. Note one can briefly regain better system performance by doing a full reset.

Bad Android IO Patterns

SQLite is the default way of persisting structured data on Android. Android documentation seems to default to showing how to do SQLite IO on main thread (explanation). This means that Android apps are often waiting on reading and writing to NAND instead of responding to user input.

Even if most of the IO happens on a background thread, the mechanics of IO dispatch and low queue depths in consumer-grade environments mean that even if there is a large off-main-thread/background IO infront small IO on main thread, small IO. will take a long time to complete. If one is lucky and only runs apps without main thread IO on Android,there will still be the problem of waiting for long IOs.

Conclusion

A core principle of performance engineering is that a system is only as fast as the slowest bottleneck. In this particular case the bottleneck is hit very frequently, so seemingly users don’t get to benefit from fancy CPUs much.

Interestingly, unlike with CPU perf, there is no correlation between random writes and price of the phone. Random 4k writes on modern flagship hw are very slow compared to any other metric. IPhone 7 struggles to do over 2MB/s. Google Pixel struggles to get above 2MB/s too.

This means that irrespective of the graphics cores, CPU cores, your phone is going to suck as much as random write perf… This sort of barely-acceptable performance will quickly turn into a “My phone is too laggy, I need to upgrade” as NAND perf deteriorates.

Instead of burying random-write-4k performance (or not doing that test at all), reviews should expose that front-and-center. Ideally they would also fill up the phone to 80% to match a realistic usecase.

There is atleast one phone vendor who gets it. Motorola G4 is 7x better than the flagships. Surprisingly my $60 ZTE ZMax Pro phone is also 2-3x better than the flagships.

If you know people who run hardware review websites, please ask them to focus on random-write-4k performance as predictor of jank/lag/frustration.

Comments/Reddit

Comments/HackerNews