Raspberry Pi holds its own against low-cost ARM NAS

Earlier this year, I pitted the $549 ASUSTOR Lockerstor 4 NAS against a homebrew $350 Raspberry Pi CM4 NAS, and came to the (rather obvious) conclusion that the Lockerstor was better in almost every regard.

Jeff Geerling holding Raspberry Pi Radxa Taco NAS board and ASUSTOR Drivestor 4 Pro

Well, ASUSTOR introduced a new lower-cost NAS, the $329 Drivestor 4 Pro (model AS3304T—pictured above), and sent me one to review against the Raspberry Pi, since it make for a better matchup—both have 4-core ARM CPUs and a more limited PCI Express Gen 2 bus at their heart.

Around the same time, Radxa also sent me their new Taco—a less-than-$100 Raspberry Pi Compute Module 4 carrier board with 5x SATA ports, 1 Gbps and 2.5 Gbps Ethernet, an M.2 NVMe slot, and an M.2 A+E key slot. (The Taco will soon be available as part of a kit with a CM4 and case for around $200.)

The specs evenly matched, at least on paper:

Radxa Taco Raspberry Pi NAS vs ASUSTOR Drivestor 4 Pro NAS spec comparison

But specs are one thing, measurable performance is another.

Disk performance

I benchmarked the raw disk access performance with fio and iozone to get a general idea of how fast the SATA drives would perform in RAID 5.

I chose RAID 5 because it taxes all the subsystems that are traditionally weak points on lower-powered ARM boards: the SoC's CPU for parity calculations when writing, the PCIe bus for throughput, and the SATA controller.

Note: All the details of my test methodology and benchmarks I ran are documented in this GitHub issue.

Here's how raw disk performance looks:

Disk benchmarks on ASUSTOR Drivestor 4 Pro vs Raspberry Pi Taco

The Pi is faster for reads, but the ASUSTOR somehow wipes the floor on writes. Seeing that the Pi board also had space for an NVMe drive, I also set up bcache in Pi OS in writeback mode to give the Pi a boost. And that definitely helped:

bcache Disk benchmarks on ASUSTOR Drivestor 4 Pro vs Raspberry Pi Taco

But NASes have to expose raw storage to a network—and that's an area lower-end NASes often fall short, especially when they try saturating more than a 1 Gbps network connection!

Network and Samba performance

Indeed, both the Taco and the Drivestor seemed to struggle to saturate a 2.5 Gbps network connection in their default configuration—both using the Realtek driver built into the Linux kernel:

RTL8125B NIC performance on 2.5G network

But I noticed if I switched the Pi over to Realtek's own driver, I could fully saturate the connection:

RTL8125B NIC performance on 2.5G network - realtek driver

I wrote up a blog post about the driver issue, but it seems like Realtek's driver has some optimizations that offload a lot of the network packet processing from the CPU, or somehow saves on interrupts by a very significant amount.

But that network throughput doesn't automatically translate to network file copy performance, as demonstrated in my next benchmark:

Samba file copy on ASUSTOR Drivestor 4 Pro vs Raspberry Pi Taco NAS

The benchmark above was a large file copy—the best case scenario. And both the Pi and the Drivestor were very consistent across multiple tests. When you tax a low power SoC with RAID 5 and set it up as the traffic cop between SATA and a 2.5G Ethernet port, it's obvious the performance is more limited than more expensive Intel/AMD options.

Overclocking could help, but honestly, if you want to see supercharged network file copy performance, opt for a more expensive and better endowed NAS.

I know you might be curious how bcache (SSD caching) speeds things up on the Pi—and the answer is not much:

Samba file copy with bcache on ASUSTOR Drivestor 4 Pro vs Raspberry Pi Taco NAS

I was surprised, but I think the reason the numbers are low is because the Pi's BCM2711 chip is hitting some sort of queuing and internal limits with the amount of traffic being routed through it. This chip is just not meant for heavy IO, and tests like this really show it.

It's still fast and reliable, though—and in many cases (especially for smaller copies that fit in RAM), the speeds are much better. Using RAID 1 or RAID 10 would also help greatly with write performance.

Conclusions

I go into more depth and explanation in my latest video comparing the ASUSTOR and the Taco, but I'll share the conclusion here that I had in that video:

Based on performance alone, the Raspberry Pi is a worthy alternative to a traditional low-end NAS, like the Drivestor 4 Pro—provided you're okay with getting your hands a little dirty.

Radxa Taco CM4 Raspberry Pi NAS board with three hard drives

You can either go fully custom and configure RAID and Samba or NFS by hand, or rely on a tool like openmediavault to get the job done. But in either case, expect to spend more time doing anything more advanced like SSD caching or using ZFS or btrfs.

One of the main reasons people opt for pre-built NASes like those from ASUSTOR is the turnkey NAS software that comes with them—operating systems like ADM (ASUSTOR Data Master) are optimized for end users and don't assume you have deeper knowledge of Linux's storage configuration.

But I do love seeing the Taco and the Drivestor 4 Pro both building around Realtek and Broadcom ARM SoCs (and the Taco could be used with CM4-compatible Rockchip boards, too). Seeing multiple solid ARM NAS products come to market this year shows how mature the ARM ecosystem has come for low to mid-range general computing!

You can buy the Drivestor 4 Pro from Amazon, and Radxa's Taco will be available at some point in early 2022.

Check out my video on these two NAS builds for a deeper dive.

Comments

Great stuff. How do you manage irqs across cores? One limitation in most pi kernel builds is that pcie interrupts are locked to one core. Additionally irqbalance is not very good on a pi, and manual tuning could be better (balance network queues, separate network from disk, etc)

irqbalance doesn't seem to do anything on the Pi, unfortunately. And while monitoring the interrupts, I did see everything tied to one core, but surprisingly, it didn't hit 99% while monitoring in atop. Not sure why.

In Jeff's scenario everything (network included) is behind the single PCIe lane so what would be the benefits of fiddling around with IRQ affinity settings?

In case all PCIe IRQs end up on cpu0 and also application processes like smbd it might be an idea to use cgroups or q&d hacks like this https://github.com/OpenMediaVault-Plugin-Developers/installScript/blob/… to move the daemons away from cpu0.

Asides that everything storage and network is not just behind the single PCIe Gen2 lane but also behind the same PCIe switch which can be a bottleneck in itself with concurrent network/storage accesses at the same time. Given you have 400 MB/s bandwidth on the bus (5 GT/s with 8b10b coding which results in 20% overhead) the assumption two PCIe devices will happily share the bandwidth while parallel accesses happen all the time is rather naive.

Hey Jeff this and some of your other videos make me think your perspective on the GNUBee2 would be interesting https://www.crowdsupply.com/gnubee/personal-cloud-2

I got one and haven't set it up yet, probably because it's not plug and play and I'm a bit lazy, but it's like the pi capable of being an open source friendly alternative to NAS needs (but unlike the pi I'm hoping a bit better suited?).

The idea behind both Gnubees was nice but IMO they chose the wrong platform and were a bit too enthusiastic about 'FOSS being the only thing that matters'. The only performance numbers for MT7621 with a NAS use case I found were from 'WiTi board' and they were horribly low (MT7621 is a really slow dual-core MIPS CPU that has some network acceleration blocks that are of no use here).

Nice article. As you have the Radxa CM3 and SOQuartz at hand - would you expect them to perform any better in this scenario?
I'd love to see them benchmarked as a NAS - but I know you have some problems to get them started.

Those two RK3566 thingies come as drop-in replacements for the CM4 which means that on usual CM4 carrier boards I/O is wasted since RK3566 has a bit more I/O compared to BCM2711 on the CM4.

Radxa's CM3 has a third row of connectors so with an appropriate carrier board (making RK3566's additional native SATA port available so the single PCIe lane can be used exclusively for an RTL8125B for 2.5GbE or maybe even 5GbE) with a somewhat decent SATA SSD will for sure allow for saturating 2.5GbE with NAS protocols like SMB or NFS.

The Taco with everything storage and network behind one PCIe switch in Gen2 mode is more of an example what's possible to connect than a reasonable NAS setup.

BTW: RK3566's bigger sibling RK3568 (which is not a drop-in replacement for the former) is the way better choice for general purpose ARM thingies that should deal with storage and network due to more I/O. While they are compatible software-wise both share the same problem: they're new ARM SoCs so expect the upstreaming work being done once the hardware is obsolete ;)

They're new ARM SoCs so expect the upstreaming work being done once the hardware is obsolete ;)

Hehe, so true it hurts. The hardware has so much potential, there's just no organization like Raspberry Pi around it to either maintain a solid fork or push things through in a more timely fashion :(

RPi Trading Ltd. isn't that fast either even with that many people on their payroll. They usually ship already obsolete hardware so where's the point? ;)

The four Cortex-A72 on the BCM2711 weren't the problem but Broadcom's switch from VC4 to VC6 which required talented people at 'RPi HQ' to waste their time with drivers for an bizarre OS by today's standards... majority of RPi users still has no clue that Linux is not the primary OS on these things.

Anyway: wrt RK35xx there's some hope. Tom (Cubie / hipboy / the Radxa guy you're in contact with?) mentioned recently the BSP (board support package) for RK3588 would be based on kernel 5.10. If Rockchip manages to update the BSP for RK3566/3568 at the same time at least there's some hope that there's something usable not too far away from mainline kernel within (half) a year. But it's a BSP kernel!

As always: a BSP kernel is something you should never trust in since employees of some random employer wrote/hacked the code which never went through the usual Linux kernel QA steps (and this also applies to RPi Trading Ltd.'s kernel – but at least their employees got the memo years ago that their work went mission critical once they realized that their main business is not 'toys for tinkerers' any more but industrial usage)

[quote]Radxa's CM3 has a third row of connectors so with an appropriate carrier board (making RK3566's additional native SATA port available so the single PCIe lane can be used exclusively for an RTL8125B for 2.5GbE or maybe even 5GbE) with a somewhat decent SATA SSD will for sure allow for saturating 2.5GbE with NAS protocols like SMB or NFS.[/quote]

based on publicly available [url=https://dl.radxa.com/cm3/docs/Radxa_CM3_datasheet_brief_v1.1.pdf]inform…] the SATA ports are multiplex with USB3 and PCIe most likely you can't have them run in parallel.

Yes, both SATA lanes are multiplexed: one with PCIe and the other with USB3 so it's perfectly fine to have a SATA/PCIe combo (no USB3 then of course).

I don't know if there are 5GbE chips that use a single PCIe lane but if something like this exists you can get 350-400 MB/s (not Mb/s) NAS performance. For me personally one of the most interesting questions is whether Rockchip's SATA implementation can cope with SATA port multipliers and also supports FIS (Frame Information Structure)-based switching :)

The performance of the Pi under heavy IO is probably limited by the poor memory bandwidth of the Pi. While it's got a 32-bit LPDDR4 interface, it suffers throughput issues. Probably due to the VC owning the memory or iffy L2 cache design. I see folks on the internet have tested memory throughput at around 4-5GB/s vs the 12+ that should be available.

Hey Jeff,

I'm a big fan, I wanted to share my interpretation of a Cheap Raspberry Pi NAS. It's on eldieturner.com

Ha, I remember seeing that on Reddit, loved the ingenuity and reuse of an existing case. I wish one of these boards were designed to fit in an OEM NAS box with hot-swap trays!

What a waste of money. I can get something on ebay for far cheaper with faster speeds than that pile of garbage

If you can't reach even a 1Gb of throughput via SMB, maybe you could try using disabling Realtek's 2.5GE and using CM4's integrated Broadcomm Ethernet?

It'll free the PCI lane for SATA controller and leave you with less switching on PCI bus.

Hi Jeff,
have you considered doing some benchmarks where the RAID array is encrypted with cryptsetup? I'd be curios to see how much of an impact that would have.

Do you know if the Radxa Taco is available to buy/where it could be bought? I've found nothing online about it.

I'd love to get one of these boards alone. No case. No CM4. I have two spare sitting in my bench waiting for a solution like this and really don't want to have to buy it and pry out the one it comes with to use the one I desire.

As I can't seem to find a carrier board that has both USB 3 slots and an NVME slot, this is the goal I was looking for. I hope they release it alone.

Hi Jeff,
good to see this time a much more balanced "apple to apple comparison" approach.
The Realtek RTD1296 quad-core ARM A53 is used in Banana Pi Router Board BPi-W2 too, maybe that would be the ideal candidate for a future comparison?

Nevertheless to "level the field" even more, I see some improvements for RPI HW setup as detailed below.
To make the hardware even more comparable I would suggest to:
1) use the RPI IO Board with readily available cheap SATA controller card in PCIe slot
2) provide benchmark figures for RAID1 using a cheap SATA hardware RAID controller on the RPI too (PCIe x1 2port cards sell in the 25 to 50€ range, 4port cards sell in the 45-80€ range. Downside is that these cheap HW RAID controllers will need to be configured currently on a Windows computer. At least I didn't find one with a Linux CLI yet.)
2) when RAID5 is used, also a SATA hardware RAID controller providing acceleration for parity calculation on the RPI should be used. (i.e. DawiControl DC-624e RAID R2 with Parity-Boost for maximum RAID 5 Performance ( Unsure if Marvell 88SE9230 chipset is providing Parity-Boost thought, as its not mentioned in datasheet. In case the Marvell chipset provides Parity-Boost, other brand boards could be used too)

Keep up the excellent work!
Best Regards
Michael

Kinda fascinating that in the SBC/hobbyist world stuff now is considered 'great' that was popular in the professional world one or even two decades ago and has been abandoned there for reasons :)

RAID5 in 2022? Really? With huge drives so rebuild times are just a mess and one failed drive results in no redundancy at all any more and the additional stress while rebuilding the array will increase the risk of another drive failing (and then your whole RAID5 array is simply gone).

Spinning rust is so cheap today that at least double paritiy should always be considered or even better the more modern concepts we have since a decade (like e.g. https://jrs-s.net/2015/02/06/zfs-you-should-use-mirror-vdevs-not-raidz/)

RAID is not backup. RAID is all and only about availability (business continuity). Hardware RAID results in no availability at all if the RAID controller dies so you need at least another one at home that's tested and has exactly same firmware since otherwise strange things happen. The firmware of these things usually is bug-ridden and crappy and you realize this usually only when it's already too late.

Personally I would use HW RAID only if I have a systems house at hand (employing multiple professionals) who always have some spare controllers lying around and who deal with at least one RAID failure at customers per week. Since only then they know what they're doing and what can go wrong. And that's not 'immediate drive failures' but drives slowly dying, idiotic firmware/controller behaviour and stuff like that.

But I had these things failing way too often on me (customers to be more precise) to even consider going that route again. We have so much better concepts in the meantime that it's just a waste of time to deal with this anachronistic stuff any more :)

Any inside information on when we could expect the release of the Taco bundle (case, compute module etc)? I'm currently putting a hold on my new 20TB RPi ZFS server, awaiting the Taco. Any other suggestions besides the Taco?