After I saw Pineboards 4K Pi 5 external GPU gaming demo at Maker Faire Hanover, I decided it was time to set up my GPU test rig and see how the Pi OS amdgpu
Linux kernel patch is going.
I tested it out on a livestream over the weekend, but I thought I'd document the current state of the patch, how to apply it, and what else is left to do to get full external GPU support on the Raspberry Pi.
I also have a full video up with more demonstrations of the GPU in use, you can watch it below:
Hardware setup for an external PCI Express GPU
There are a few different routes you can go to physically plug a graphics card into a Pi 5.
My preferred setup is this JMT External Graphics Card stand that uses Oculink with an M.2 to Oculink adapter (included). To use it, you also need an Oculink cable, and those together run $80.
On top of that (or more specifically, on top of the Pi), you need a HAT that converts the PCIe FFC connection on the Pi 5 to an M.2 slot, and my choice is the Pineboards HatDrive! Bottom, though there are tons of other options. That adds on another $20 or so.
The other option is to skip the external GPU stand entirely and mount it right on top of the Pi 5. You can do that with the uPCIty Lite, which is $30, and has an open-ended x4 PCIe slot.
That takes care of the PCIe signaling—but you also need to provide adequate power.
The Pi's PCIe FFC only supports up to 5W of power output. Regardless of the HAT you choose, you'll need to provide adequate power to the slot (up to 75W), and usually also to the card you insert into it (via PCIe ATX power connectors—requirements vary by card).
For that, I'm using this LIAN LI 750W SFX PSU, which has adequate power and cabling to supply power to the PCIe riser—or the uPCIty's 4-pin 12V CPU power intput, as well as to the graphics card's supplemental PCIe power jack.
If you choose uPCIty Lite, or some other method that doesn't have a 24-pin ATX power input like the graphics card stand I'm using, you'll also need a way to force your ATX power supply to turn on, like this ATX 24-pin Power Switch—or a jumper placed across the appropriate pins on the connector.
Choosing a card and Getting PCIe Gen 3
With the PCI Express slot ready to go, you need to choose a card to go into it. After a few years of testing various cards, our little group has settled on Polaris generation AMD graphics cards.
Why? Because they're new enough to use the open source amdgpu
driver in the Linux kernel, and old enough the drivers and card details are pretty well known.
We had some success with older cards using the radeon
driver, but that driver is older and the hardware is a bit outdated for any practical use with a Pi.
Nvidia hardware is right out, since outside of community nouveau
drivers, Nvidia provides little in the way of open source code for the parts of their drivers we need to fix any quirks with the card on the Pi's PCI Express bus.
GitHub user Coreforge and myself (and Pineboards now, too) all chose the RX 460 4 GB as the model to test with, because it's new enough to be useful, old enough to be cheap, and uses PCI Express Gen 3, which is perfect for the Pi 5's bus.
Speaking of, to force Gen 3 speed on the Pi 5's PCI Express bus, you need to edit /boot/firmware/config.txt
and add the following line at the bottom:
dtparam=pciex1_gen=3
The Pi 5's external PCI Express bus only provides 1 lane (x1), for 8 GT/s (a boost from the 5 GT/s you get with the default PCIe Gen 2 speed).
Applying the Linux kernel patch
With the hardware connected and the Gen 3 speed configured, you could boot the Pi and identify the card using lspci
, but Raspberry Pi OS won't be able to use the card, because the amdgpu
driver isn't included by default in the Pi OS.
Therefore, it's time to recompile the Linux kernel!
Follow Raspberry Pi's guide: Build the Linux kernel.
After the git clone
step, you'll need to download and apply the patchset we've been working on to enable Polaris-generation cards on Pi 5. Assuming you're in the linux
checkout directory (cd linux
), run these commands:
wget -O amdgpu-pi5.patch https://github.com/geerlingguy/linux/pull/8.patch
git apply -v amdgpu-pi5.patch
You should see it apply successfully—if not, either the patch is outdated for the latest 6.6.y
Pi OS branch, or you may have checked out a different kernel release. This particular patch was made against the 6.6.y Linux kernel.
Before you start recompiling the Linux kernel (following the rest of the instructions in the Pi kernel guide), you should also patch in Coreforge's optimized memcpy
library:
wget https://gist.githubusercontent.com/Coreforge/91da3d410ec7eb0ef5bc8dee24b91359/raw/b4848d1da9fff0cfcf7b601713efac1909e408e8/memcpy_unaligned.c
gcc -shared -fPIC -o memcpy.so memcpy_unaligned.c
sudo mv memcpy.so /usr/local/lib/memcpy.so
sudo nano /etc/ld.so.preload
# Put the following line inside ld.so.preload:
/usr/local/lib/memcpy.so
To make sure the amdgpu
driver is enabled when you recompile the Linux kernel, run make menuconfig
(you'll also need to apt install libncurses-dev
), and navigate through the menus to select AMD GPU.
Then, follow the instruction to compile the kernel (make -j6 Image.gz modules dtbs
), and install it, moving all the parts into place with the sudo cp
commands.
The last thing you need to do is install the AMD firmware:
sudo apt install firmware-amd-graphics
Now, reboot, and the Pi 5 should be able to output video through the HDMI port, DisplayPort, or whatever other port on the external graphics card.
If not, debug the connection using a UART connection to the Pi, the Pi's onboard micro HDMI connection, or over SSH — use dmesg
to see kernel messages (usually there's a pretty obvious error you can start searching).
4K Gaming on Pi 5 (for real)
Now comes the fun part. The Pi 5 supposedly supports 4K display output. But if you use it at 60 Hz, even normal UI elements will feel a bit laggy.
With the RX 460, I get smooth 60 Hz output at 4K resolution. And if you install a game like SuperTuxKart (sudo apt install supertuxkart
), you'll be able to play with all graphics settings maxed out, at 4K.
It gives me 15-20 fps at that resolution, but if I drop the graphics options down a slight bit, I can get 60+ fps all day. The Pi 5's internal VideoCore GPU isn't playable with maxed out graphics settings even at 1080p!
I also installed Doom 3 with Pi-Apps, and got a solid 60 fps at 4K (it seems like the engine locks the game at 60 fps, I could get a lot more than that if I were able to unlock it—but it's been a long time since I hacked around with the old Id games' console...
Again, the Pi's internal GPU struggles to give a playable experience even on lower graphics settings at 1080p.
I couldn't get Steam installed using Box86/Box64 yet, but would like to try that with Doom Eternal and some other games I know play okay on other Arm64 platforms like my Ampere workstations (incidentally, with Nvidia GPUs like the 4070 Ti and 4090... which have better Arm64 drivers for more fully-compliant PCI Express buses).
Outside of games, I ran glmark2-es2
, and on the Pi's internal V3D graphics, I got a score of about 1800. On the external AMD RX 460, I got 2383.
In a bit of a surprise, nvtop
actually works out of the box (sudo apt install nvtop
) and provides a much better overview of GPU utilization than radontop
. It even includes temperature and fan speed info, in addition to the basics like clock speeds and feature utilization.
Other GPU uses
One downside to the Polaris generation AMD graphics cards is ROCm support was dropped years ago, so using the RX 460 for compute is a bit tricky.
With only 4 GB of VRAM and a few-generations-outdated GPU efficiency, it's not that compelling for things like LLMs or model training anyways. But one could pursue smaller models or other compute uses, as an academic exercise.
The much more enticing use is for transcoding.
Linux has decent support for GPU-accelerated video encode/decode, using the VA-API (Video Acceleration API).
And the RX 460 should support up to 10-bit H.264 encode/decode (at least if I'm reading specs correctly), up to 4K... but this is something I haven't gotten working on my setup, just yet. Checking with vainfo
gives an error:
pi@pi5-pcie:~ $ DISPLAY=:0 vainfo
libva info: VA-API version 1.17.0
libva info: Trying to open /usr/lib/aarch64-linux-gnu/dri/radeonsi_drv_video.so
libva info: va_openDriver() returns -1
vaInitialize failed with error code -1 (unknown libva error),exit
So far transcoding support hasn't been Coreforge's focus, as much as memory alignment fixes. And on his machine, vainfo
is showing support for H.264, HEVC, VC1, and MPEG2 transcoding!. At this point I'm trying to figure out what's different between our systems so I can get it working.
If you could grab a cheap old graphics card, drop it on a Pi 5, and transcode video with it, it makes for an even more compelling low-power, quiet Arm NAS running a completely open source stack like Jellyfin + SMB + OpenZFS.
What's left?
I've been running this configuration for a couple days, and it's perfectly stable. There are still memory alignment bugs that some applications run into, and the driver is not completely compatible yet.
The Chromium browser interface seems to freeze sometimes when running through the external graphics card, and I'm not quite sure what's causing that. The settings menu pops up, and the title bar updates (e.g. if you type in "Jeff Geerling" in the location bar and hit enter), but nothing else in the window updates.
I installed Firefox (sudo apt install firefox
), and it didn't have any issues—so my best guess is Chromium is trying to use GPU acceleration for it's UI by default, and there's a driver bug it's hitting in that state.
Outside of that, it would be nice to get the amdgpu
driver in the kernel working with all generations of AMD GPUs, so one could use newer cards, or experiment with modern ROCm on a Pi 5.
The PCI Express Gen 3 x1 bus speed is a limiting factor, but there are plenty of use cases where that's enough bandwidth.
Besides, it's just fun to push hardware to its limits. I've certainly learned a lot about PCIe, arm64, the Linux kernel, and AMD's drivers already!
Comments
Wow, we did it! I’ve always thought it would be cool to have a Pi with a discrete GPU, but I never thought it was going to happen. On the topic, if we can run other ARM chips with dGPUs like you mentioned, why did we struggle to get it running on the Pi 5? (Other than x1 PCIe lol)
Great blog post, and awesome work from the open source community once again o7
Each Arm CPU I've seen seems to have one or two PCIe quirks — some are more egregious than others. And companies like Ampere work directly with manufacturers like Nvidia to validate their cards/drivers on their platform.
Raspberry Pi doesn't do that, and AMD and Nvidia (at least) don't care too much about supporting the Raspberry Pi (not that I blame them lol), so it's up to the community to fix the quirks in the driver for this particular platform.
It just so happens the patches we are working on also help on some other ARM SoCs, as well as even for RISC-V SoCs (which I'll also be testing soon!).
My understanding is ROCm support for Polaris GPUs, though not supported, works on x86 Linux with the caveat that some hand-coded assembler needs removed from the rocBLAS library. Note, however, the CPU apparently needs PCIe3 atomics
https://github.com/ROCm/ROCm/blob/docs/6.2.2/docs/conceptual/More-about…
which then begs the question whether the Pi 5 supports this.
Is there some specific reason for going with rx460 rather than something like rx590?
can't wait for an official Raspberry Pi GPU hat
It's not that much more powerful than what's already in the Pi, bit it gives you full sized HDMI
/s
Have you seen what the Asahi Linux people just put out using FEX-Emu and MicroVMs? https://fedoramagazine.org/gaming-on-fedora-asahi-remix/ seems like you might be able to re-use this on the Raspberry Pi to help with Steam.
Yes! I will have to take a look. Still haven't tried FEX-Emu but I definitely think it's time... would also let me keep my 16k page size kernel and not have to switch to 4k.
Do any of the slots or external extension support a modern enough Nvidia for Cuda?
As Jeff says in the video, the hardware isn't the limiting factor. Nvidia's driver situation is less open source than AMD's and the binary blobs in Nvidia's driver stack don't support the sort of manipulation being done here.