HTGWA: Create a ZFS RAIDZ1 zpool on a Raspberry Pi

This is a simple guide, part of a series I'll call 'How-To Guide Without Ads'. In it, I'm going to document how I set up a ZFS zpool in RAIDZ1 in Linux on a Raspberry Pi.

Prequisites

ZFS does not enjoy USB drives, though it can work on them. I wouldn't really recommend ZFS for the Pi 4 model B or other Pi models that can't use native SATA, NVMe, or SAS drives.

For my own testing, I am using a Raspberry Pi Compute Module 4, and there are a variety of PCI Express storage controller cards and carrier boards with integrated storage controllers that make ZFS much happier.

I have also only tested ZFS on 64-bit Raspberry Pi OS, on Compute Modules with 4 or 8 GB of RAM. No guarantees under other configurations.

Installing ZFS

Since ZFS is not bundled with other Debian 'free' software (because of licensing issues), you need to install the kernel headers, then install two ZFS packages:

$ sudo apt install raspberrypi-kernel-headers
$ sudo apt install zfs-dkms zfsutils-linux

Verify ZFS is loaded

$ dmesg | grep ZFS
[ 5393.504988] ZFS: Loaded module v2.0.2-1~bpo10+1, ZFS pool version 5000, ZFS filesystem version 5

You should see something like the above. If not, it might not have loaded correctly.

Prepare the disks

You should have at least three drives set up and ready to go. And make sure you don't care about anything on them. They're gonna get erased.

List all the devices on your system:

$ lsblk
NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda           8:0    0  7.3T  0 disk 
└─sda1        8:1    0  7.3T  0 part /mnt/mydrive
sdb           8:16   0  7.3T  0 disk 
sdc           8:32   0  7.3T  0 disk 
sdd           8:48   0  7.3T  0 disk 
sde           8:64   0  7.3T  0 disk 
nvme0n1     259:0    0  7.3T  0 disk 
└─nvme0n1p1 259:1    0  7.3T  0 part /

I want to put sda through sde into the RAIDZ1 volume. I noticed sda already has a partition and a mount. We should make sure all the drives that will be part of the array are partition-free:

$ sudo umount /dev/sda?; sudo wipefs --all --force /dev/sda?; sudo wipefs --all --force /dev/sda
$ sudo umount /dev/sdb?; sudo wipefs --all --force /dev/sdb?; sudo wipefs --all --force /dev/sdb
...

Do that for each of the drives. If you didn't realize it yet, this wipes everything. It doesn't zero the data, so technically it could still be recovered at this point!

Check to make sure nothing's mounted (and make sure you have removed any of the drives you'll use in the array from /etc/fstab if you had persistent mounts for them in there!):

$ lsblk
NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda           8:0    0  7.3T  0 disk 
sdb           8:16   0  7.3T  0 disk 
sdc           8:32   0  7.3T  0 disk 
sdd           8:48   0  7.3T  0 disk 
sde           8:64   0  7.3T  0 disk 
nvme0n1     259:0    0  7.3T  0 disk 
└─nvme0n1p1 259:1    0  7.3T  0 part /

Looking good, time to start building the array!

Create a RAIDZ1 zpool

The following command will create a zpool with all the block devices listed:

$ sudo zpool create zfspool raidz1 sda sdb sdc sdd sde -f

For production use, you should really read up on the benefits and drawbacks of different RAID levels in ZFS, and how to structure zpools and vdevs. The specific structure you should use depends on how many and what type of drives you have, as well as your performance and redundancy needs.

Verify the pool is set up correctly:

$ zfs list
NAME      USED  AVAIL     REFER  MOUNTPOINT
zfspool   143K  28.1T     35.1K  /zfspool

$ zpool status -v zfspool
  pool: zfspool
 state: ONLINE
config:

    NAME        STATE     READ WRITE CKSUM
    zfspool     ONLINE       0     0     0
      raidz1-0  ONLINE       0     0     0
        sda     ONLINE       0     0     0
        sdb     ONLINE       0     0     0
        sdc     ONLINE       0     0     0
        sdd     ONLINE       0     0     0
        sde     ONLINE       0     0     0

errors: No known data errors

And make sure it was mounted so Linux can see it:

$ df -h
...
zfspool          29T  128K   29T   1% /zfspool

Destroy a pool

If you no longer like swimming in the waters of ZFS, you can destroy the pool you created with:

$ sudo zpool destroy zfspool

Note: This will wipe out the pool and lead to data loss. Make sure you're deleting the right pool and don't have any data inside that you care about.

Comments

Why does zfs not enjoy usb drives?

What pitfalls should I expect if I run an 8-bay usb3.0 enclosure?
Slowdown?
Data integrity?

This tutorial is far from complete, one shouldn't mount and use the complete pool as such.

Most importantly, everyone except for a couple of niche cases where you have lots ( really a lot ) of ECC RAM, you should always disable ZFS deduplication...
so after creating the pool, disable it
> zfs set dedup=off zfspool
and enable (LZ4) compression, which is extremely lightweigt, and will increase your thruput.
> zfs set compression=lz4 zfspool

One should create datasets as mountpoints like so, datasets will inherit the above settings. think of datasets as a kind of partition, but one thats spans disks.
> sudo create zfspool/home
> sudo create zfspool/data
etc...
each dataset can use the whole pool, and datasets can be mounted, snapshotted and send.
upon creation it can be labelled as a block device, and exported for example via iSCSI / formatted with another FS etc ..., in the background it will still be ZFS ( for example as swap, either as a file or "partition" )
> sudo create -V zfspool/blockdevice
...
https://pthree.org/2012/04/17/install-zfs-on-debian-gnulinux/

Deduplication isn't enabled by default, unless it's inherited from a parent dataset. It's a safe assumption that it isn't turned on and doesn't need to be mentioned in the guide.

That aside, even if you have gobs of memory (ECC or not) using deduplication in ZFS is a fools errand for 99.99% of all cases. Every TB written requires ~1GB of RAM, permanently, even if you delete the dataset/zvol it was enabled on. The only way to reset it is to delete the zpool entirely.

Hopefully that'll be improved some day, but until then ZFS+dedupe is pretty much a no-no.

Depending on your use/need for ZFS on the Pi, but I bet you could workaround having a physical drive interface. You should be able to target a file as your drive. That way you pull ZFS out of the USB loop. This could be valuable in you were running a RaSCSI setup for instance, and you wanted auto snapshotting while you were working with vintage computers. A very useful thing when you consider how easy it can be to damage or corrupt some of those old systems. That way, if you made an ill fated move, you could recover the snapshot the system was automatically taking.

As of right now, with the latest kernel 5.15 on Pi OS, there's an issue installing the version of ZFS in the main repo—so you have to use bullseye-backports instead. Hopefully that gets fixed soon!

I can confirm that at this date, 11/6/2022, there’s a problem installing ZFS on the latest Raspberry Pi OS. What’s interesting is that about a month ago, your install steps worked! Now, they don’t.

I tried the the most promising-looking backports method that you give reference to at the Raspberry Pi forums, and it didn't work. The method detailed by santiagobiali on March 30, 2022. It seems very disappointing that such an important aspect of operation on the Raspberry Pi OS is broken. I don't know if the 5.15 kernel update broke it, or what. Probably some astute developers for the Raspberry Pi OS are hard at work resolving this issue as I write this, but for now, I wouldn't recommend trying to use ZFS on bullseye-based Raspberry Pi OS. Thanks for your excellent tutorial!

Like other commenters for some reason I was not able to make ZFS work on the current version of Raspbian OS (as of 2022-11-21) but I had success with following these steps using Ubuntu 22.04.1 LTS. Thank you for the writeup!

This is the wrong way to build a zpool.
"sudo zpool create zfspool raidz1 sda sdb sdc sdd sde -f"
You never build MD raid devices or ZFS arrays that way, you should always use fixed identifiers. The process looks more like this;
$ ls -lh /dev/disk/by-id
total 0
lrwxrwxrwx 1 root root 9 Dec 15 12:57 ata-SanDisk_SSD_PLUS_240GB_## -> ../../sda

lrwxrwxrwx 1 root root 9 Dec 15 12:57 wwn-0x5000c500##d66f8 -> ../../sdb
lrwxrwxrwx 1 root root 9 Dec 15 12:57 wwn-0x5000c5007##9f8 -> ../../sdd
lrwxrwxrwx 1 root root 9 Dec 15 12:57 wwn-0x5000c50###80e37 -> ../../sdc
lrwxrwxrwx 1 root root 9 Dec 15 12:57 wwn-0x5000c500a4##73 -> ../../sde

Then build the pool and define the drives using the info from above in your layout.
$ sudo zpool create -o pi-pool raidz1 wwn-0x5000c####66f8 wwn-0x5000c500###e9f8 wwn-0x5000c500a##37 wwn-0x5000c500###773
This is done to prevent issues in loading the array should the OS change the names of the drives if the drives are moved around. In this example I have used the (WWN) World-Wide-Names but serial numbers and UUIDs should also work.

Counter suggestion - build them with sda sdb sdc sdd, then export the pool and re-import it using the /dev/disk/by-id, like so:

zpool create pool raidz sda sdb sdc sdd
zpool export pool
zpool import -d /dev/disk/by-id/ pool

Let the tooling do the conversion from sda to wwn. Which is mostly what zfs and mdadm and btrfs and lvm already do when they come online anyway. "Hey, which of these storage devices have my fingerprints on them, and can I form an array out of those objects?"