My current employer does a lot of really cool systems work that’s covered by NDAs. I recently did some work to integrate a cool open source tool into our workflow. Felt it deserved a blog post.
NFS Testing Requires Parallelism.
I was asked to test FlashBlade2 performance scaling. I needed to generate NFS read workloads of 15-300 Gigabytes/second. Given that NICs in our lab max out at 100Gbit, (~10GB/s), this required a lab with fast servers and multiple NICs per server. I wanted to make maximum use of hardware to minimize lab space + cost. To make use of multiple NICs I needed multiple NFS connections per NIC. Linux does not make this easy3.
The project came down to:
What is an easy way to drive and measure a high-throughput NFS workload involving multiple servers and multiple connections per server?
This didn’t exist. Was time to write something.
Fio is a gem of a benchmark utility. It has a plugin architecture capable of testing anything from phones to hadoop clusters. It records decent statistics and has a nice configuration language to define workloads. Crucially, it also can work in a client/server configuration to drive clustered workloads.
Incredibly, Jens has been maintaining fio since at-least 2005. Changes4 in storage technology made other utilities obsolete in meantime. Maybe fio has peaked too as it looks like his most recent io_ring IOPS records are not being tested with fio5.
Two years ago when I first evaluated fio, it worked great for testing NFS via usual linux filesystem drivers, you just had to mount the relevant filesystems3.
libnfs is a lovely userspace implementation of NFS client protocol. I opted to integrate that into fio as an NFS plugin.
Now one can use fio
ioengine=nfs to drive NFS workloads from userspace without worrying about kernel NFS-client bottlenecks/workarounds.
A simple write + read workload with 10 parallel connections is in the fio source code.
Results: We can test PureStorage FlashBlade with reasonably small labs
We now use fio with a client/server workload similar to example above. Fio can drive around 25 gigabytes6 of NFS read throughput per 1U AMD EPYC3 server. Thus we could reach 300 gigabytes with 12 servers.
It took about a month to write/debug the fio/libnfs integration and another week to get it landed in fio. The maintainers were super-friendly, helpful and responsive. It was my best experience contributing to an open-source project.
Selecting and learning to tune hardware took many more months of work. If there is further interest on tuning Linux to read from multiple 100G nics, I can do a followup post on that.
Disclaimer: Fio with nfs plugin is fantastic for achieving hero bandwidth numbers, but it needs more work before it’s useful for any sort of failure-testing.
It’s a fun product to dogfood. Can mount the same filesystem on 100-1000s of machines and not worry about slow or limited shared storage. ↩︎
Linux kernel nfs3 has an “optimization” that multiplexes mounts to same server via a single NFS connection. This means that to establish multiple connections one must do something terrible like requiring NFS server to have multiple IPs or to do mounts from own IP+filesystem namespaces. If there is interest I can blog about doing that too. ↩︎
Solid-state drives brought dedup, compression, and massive increases in IOPS. These required significant re-engineering of existing benchmarks. ↩︎
This required extensive NUMA-aware NIC selection and tuning. Intel and older AMD CPUs were slower. ↩︎