Curious Case of Maintaining Sufficient Free Space with ZFS

Sometimes trivial problems turn out to be more interesting than expected.

TLDR: ZFS free-space reporting is a lagging indicator.

Background

I use Proxmox VM server backed by a ZFS array of hard drives for various personal infrastructure. I also have security cameras that upload motion-triggered videos to my server (via FTP!).

Problem Description

I would like to use 90% of my available space for most-recent security videos.

Recipe:

1. Create a dedicated ZFS volume
2. Setup ZFS quota
3. Run a cronjob to free space faster than it gets consumed by video uploads.

I set out to find a decent disk-freeing solution. I eventually settled on a python free-disk. (Most of these utils were written in bash, didn’t want to disk having to grasp & modify write-only code).

I modified free-disk to support deleting files with a particular extension (e.g. .mp4) and deployed it. My changes to free-disk are here. Author took part of them in, hoping for rest to get upstreamed too.

I was surprised to see that the script freed up roughly 10x more than I expected. So I added some debug logging and after a few days of experiments ended up with following:

15:17:02-0700:DEBUG:Required free: 9.31GiB. 3.4GiB to free. --track-
bytes-deleted=True
Jul 09 15:17:37 quad free-disk[70706]: 2022-07-09T15:17:37-0700:INFO:Removed 40 file(s) with modification date <= 2022
-06-18T18:32:18.734254Z. Deleted 3.46GiB. Filesystem freed 2.22GiB.
Jul 09 15:17:37 quad free-disk[74496]: Filesystem         1M-blocks     Used Available Use% Mounted on
Jul 09 15:17:37 quad free-disk[74496]: rpool/data/tinycam  1572864M 1564537M     8328M 100% /export/tinycam
Jul 09 15:17:39 quad free-disk[74499]: Filesystem         1M-blocks     Used Available Use% Mounted on
Jul 09 15:17:39 quad free-disk[74499]: rpool/data/tinycam  1572864M 1564537M     8328M 100% /export/tinycam
Jul 09 15:17:41 quad free-disk[74502]: Filesystem         1M-blocks     Used Available Use% Mounted on
Jul 09 15:17:41 quad free-disk[74502]: rpool/data/tinycam  1572864M 1564537M     8328M 100% /export/tinycam
Jul 09 15:17:43 quad free-disk[74616]: Filesystem         1M-blocks     Used Available Use% Mounted on
Jul 09 15:17:43 quad free-disk[74616]: rpool/data/tinycam  1572864M 1563300M     9565M 100% /export/tinycam


To get this data I added alernate logic to free-disk to free based on bytes-deleted instead of filesystem free-space (eg number reported by df). I also added a few sleep 2;df -h /export/tinycam commands to track progress.

It seems that after I delete files, it takes roughly 5-10seconds for disk-free information to settle. Problem-settled: “obviously one should only free space by tracking bytes-deleted” or “add sleep 10 before checking free-space”. Not so fast :)

Complexity of Modern Free-Space tracking

ZFS is a filesystem that supports deduplication & compression. It also supports snapshots. This means that the file being deleted could:

1. Be a uncompressed and have no other duplicates.
2. Be compressed and space freed would be less than the size shown by stat
3. Be referenced by a snapshot, so no space is actually freed when the file is deleted.
4. Be dedupped to another file
5. Some combo of all of the above

So in order for ZFS to know how much space has been reclaimed by file deletion it needs to check all these conditions and it should do this async of file-deletion to now slow down FS-operations.

Luckily ZFS documents this:

ZFS is a transactional file system. Most file system modifications are bundled into transaction groups and committed to disk asynchronously. Until these modifications are committed to disk, they are termed pending changes. The amount of space used, available, and referenced by a file or file system does not consider pending changes. Pending changes are generally accounted for within a few seconds. Even committing a change to disk by using fsync(3C) or O_SYNC does not necessarily guarantee that the space usage information is updated immediately.

Conclusion: There is no universal way to ensure space is freed on ZFS. One needs to carefully consider how ZFS is being used in order to understand how to automatically maintain free disk space.

I get burned by (3) about once or twice a year, eventually I remember that I have a snapshot that diverged too much. This lagging-indication of free-space was new and fun, figured it was worth a blog post.