Reading NFS at >=25GB/s using FIO + libnfs

My current employer does a lot of really cool systems work that’s covered by NDAs. I recently did some work to integrate a cool open source tool into our workflow. Felt it deserved a blog post.

NFS Testing Requires Parallelism.

I work for Pure Storage. One of the products we make is a scale-out NFS¹ (and S3-compatible) server called FlashBlade.

I was asked to test FlashBlade² performance scaling. I needed to generate NFS read workloads of 15-300 Gigabytes/second. Given that NICs in our lab max out at 100Gbit, (~10GB/s), this required a lab with fast servers and multiple NICs per server. I wanted to make maximum use of hardware to minimize lab space + cost. To make use of multiple NICs I needed multiple NFS connections per NIC. Linux does not make this easy³.

The project came down to:

What is an easy way to drive and measure a high-throughput NFS workload involving multiple servers and multiple connections per server?

This didn’t exist. Was time to write something.

FIO

Fio is a gem of a benchmark utility. It has a plugin architecture capable of testing anything from phones to hadoop clusters. It records decent statistics and has a nice configuration language to define workloads. Crucially, it also can work in a client/server configuration to drive clustered workloads.

Incredibly, Jens has been maintaining fio since at-least 2005. Changes⁴ in storage technology made other utilities obsolete in meantime. Maybe fio has peaked too as it looks like his most recent io_ring IOPS records are not being tested with fio⁵.

Two years ago when I first evaluated fio, it worked great for testing NFS via usual linux filesystem drivers, you just had to mount the relevant filesystems³.

libnfs + FIO => performant

libnfs is a lovely userspace implementation of NFS client protocol. I opted to integrate that into fio as an NFS plugin.

Now one can use fio ioengine=nfs to drive NFS workloads from userspace without worrying about kernel NFS-client bottlenecks/workarounds.

A simple write + read workload with 10 parallel connections is in the fio source code.

Results: We can test PureStorage FlashBlade with reasonably small labs

We now use fio with a client/server workload similar to example above. Fio can drive around 25 gigabytes⁶ of NFS read throughput per 1U AMD EPYC3 server. Thus we could reach 300 gigabytes with 12 servers.

It took about a month to write/debug the fio/libnfs integration and another week to get it landed in fio. The maintainers were super-friendly, helpful and responsive. It was my best experience contributing to an open-source project.

Next steps

Selecting and learning to tune hardware took many more months of work. If there is further interest on tuning Linux to read from multiple 100G nics, I can do a followup post on that.

Disclaimer: Fio with nfs plugin is fantastic for achieving hero bandwidth numbers, but it needs more work before it’s useful for any sort of failure-testing.

Comments

https://en.wikipedia.org/wiki/Network_File_System ↩︎
It’s a fun product to dogfood. Can mount the same filesystem on 100-1000s of machines and not worry about slow or limited shared storage. ↩︎
Linux kernel nfs3 has an “optimization” that multiplexes mounts to same server via a single NFS connection. This means that to establish multiple connections one must do something terrible like requiring NFS server to have multiple IPs or to do mounts from own IP+filesystem namespaces. If there is interest I can blog about doing that too. ↩︎ ↩︎
Solid-state drives brought dedup, compression, and massive increases in IOPS. These required significant re-engineering of existing benchmarks. ↩︎
https://twitter.com/axboe/status/1452689372395053062 ↩︎
This required extensive NUMA-aware NIC selection and tuning. Intel and older AMD CPUs were slower. ↩︎

NFS Testing Requires Parallelism.#

FIO#

Results: We can test PureStorage FlashBlade with reasonably small labs#

Next steps#

NFS Testing Requires Parallelism.

FIO

Results: We can test PureStorage FlashBlade with reasonably small labs

Next steps