External graphics cards work on the Raspberry Pi

AMD Radeon HD 7450 Graphics card with Raspberry Pi Compute Module 4

In October 2020, after Raspberry Pi introduced the Compute Module 4, I started out on a journey to get an external graphics card working on the Pi.

At the time, it'd been over a decade since the last time I'd built a PC, and I had a lot to learn about PCI Express, the state of graphics card drivers in Linux, and PCI Express support on various ARM SoCs.

After failing to get the Nvidia GT710 or AMD 5450 running, I started testing the GTX 750 Ti, RX 550, and SM750, all with wildly different architectures and driver support. After failing to get those cards working, I tested a newer GTX 1080, and even splurged on an AMD Radeon RX 6700 XT on the Pi—which also didn't work.

Along the way, dozens of people (from AMD engineers, to ARM enthusiasts, to fellow hobbyists like me) helped explore the dark, dusty corners of the BCM2711—Broadcom's ARM SoC that powers the Raspberry Pi Compute Module 4.

What we found is the BCM2711's PCIe root complex is fundamentally broken, at least when it comes to some memory operations on 64-bit Linux. Some speculated the brokenness couldn't be worked around, but as Winston Churchill once said:

Success is stumbling from failure to failure with no loss of enthusiasm.

This issue in particular, with over 490 comments as of this writing, documents dozens of failures in one central location, to the point where they could be categorized and worked around in a set of patches to the open source radeon driver.

Video for this Blog post

I've also made the video embedded below, to help illustrate the journey, and to show more about how the graphics cards are—and aren't—working on the Pi:

How to get an AMD GPU working

memcpy function patch in Linux

Before you get started, you'll need to have on hand:

  1. Raspberry Pi Compute Module 4 (hard to find currently, check rpilocator for stock).
  2. Raspberry Pi Compute Module 4 IO Board (or another IO board with a PCI Express slot).
  3. PCIe x1 to x16 riser/adapter (if the IO board you have only has a x1 slot).
  4. An AMD Radeon graphics card in the 5000/6000/7000 line (We've confirmed at least the 5450, 6450, and 7470 work).

Prepare the OS

The current working patch is based off the previous 5.10.y Linux fork Raspberry Pi maintained, so you need to flash a copy of Raspberry Pi OS from earlier this year (not the latest). I downloaded 2022-01-28-raspios-bullseye-arm64-full.zip from here and expanded it, then used Raspberry Pi Imager to flash it to a microSD card.

I put that card in my Raspberry Pi, and installed AMD's firmware with sudo apt install -y firmware-amd-graphics.

Then I went to my main workstation and cross-compiled the Raspberry Pi kernel. The exact environment and process I follow is thoroughly documented here: Raspberry Pi Linux Cross-compilation Environment. That setup should work on any Mac or Linux workstation. I haven't tested it on Windows.

Why cross-compile? Well, a fresh compilation takes between 6-10 minutes on my main workstation. On a Compute Module 4, the process takes almost an hour.

Before compiling Linux, you need to make sure the branch that's checked out is this branch, from Coreforge's Pi OS Linux fork. Alternatively, you can clone the raspberrypi/linux source at rpi-5.10.y, and apply Coreforge's branch as a patch file.

I've been working on a 5.15.y update to that branch, but that version isn't quite working yet, since some of the overridden AMD driver flags we modified were removed between Linux 5.10 and 5.15.

To make things easier on yourself, blacklist the radeon driver before copying the cross-compiled kernel to the Pi. Create a file named /etc/modprobe.d/blacklist-radeon.conf, with the contents:

blacklist radeon

Then copy the cross-compiled kernel to the Pi. We're almost done, but to make Xorg and other compositors like Weston run, you also need to override the memcpy library:

# Download Coreforge's modified memcpy library.
wget https://gist.githubusercontent.com/Coreforge/91da3d410ec7eb0ef5bc8dee24b91359/raw/1b72d428b2fe1cba459d5ae7f73663483743ff55/memcpy_unaligned.c

# Compile the library and move it into place.
gcc -shared -fPIC -o memcpy.so memcpy_unaligned.c
sudo mv memcpy.so /usr/local/lib/memcpy.so

# Create an `ld.so.preload` file to instruct Linux to use our version of `memcpy`.
sudo nano /etc/ld.so.preload

# Put the following line inside ld.so.preload:
/usr/local/lib/memcpy.so

Load the driver

Now, reboot the Raspberry Pi. After it reboots, you can open up a terminal session and run dmesg --follow to see what's going on (you don't have to, though).

To load the radeon driver, run:

sudo modprobe radeon

After 10 or 20 seconds, if you have a monitor plugged into the Radeon card, it should come up as the driver loads. Typically I set Pi OS to boot to console (CLI) instead of to the graphical system, since that is more stable.

For better stability running Xorg (startx to launch) or Weston (weston-launch to launch), you should also add the following options to your /boot/cmdline.txt (in the same line as the other options) and reboot:

radeon.uvd=0 pci=noaer,nomsi radeon.msi=0 radeon.pcie_gen2=0 pcie_aspm=off radeon.aspm=0 radeon.runpm=0 radeon.dpm=0

The patches and cmdline options are still actively being investigated. Follow this issue for further progress: Test GPU (VisionTek Radeon 5450 1GB).

We've also gotten the SiliconMotion SM750 graphics working with this patch, but as it's an older fb driver, it only reliably works for a text console, as there are issues getting Xorg running with it. The driver in the Linux kernel isn't really maintained, and wasn't the highest-quality to begin with.

What works

Pi OS Xorg desktop janky glitches with neofetch stats

In summary: DisplayPort, VGA, HDMI, and DVI ports. The command line (console), Xorg, and Weston (a reference implementation of Wayland), as well as some 3D benchmarks and applications that use OpenGL.

But Xorg especially shows a lot of 'glitches' in its output (see above), especially when interacting with different screen elements.

Weston running more smoothly on Radeon on Raspberry Pi

Weston (pictured above) didn't have the same glitchy behavior, but ran a bit sluggish and would often lock up after a while (necessitating a soft reboot).

GLMark2 DRM jellyfish example

glmark2-drm (see how I installed GLMark2 on the Pi) and glxgears usually ran all the way through, but sometimes would lock up in the middle of a run.

The driver is far from optimal in its current state—there's a lot of debug code currently, and the memory copy implementations err on the side of caution, slowing down some operations significantly (GLMark2 gives a score of about 50, and glxgears was rendering at 25-35 fps—slower than the VC4 GPU built into the Pi!

What doesn't work

A lot of things still don't work, since each specific feature of a given line of cards would need more work combing through code, finding memory issues.

As one example, H.264 acceleration is currently disabled, so using the card as an ffmpeg video transcoding accelerator isn't going to work. Also, assuming we could get Nvidia's drivers working (or more realistically, nouveau, since Nvidia doesn't open source their driver), things like CUDA cores would still be inaccessible.

And even after more work, it's unlikely you'd be able to do something like play a AAA game on a Pi with an external GPU. Many things are working against this possibility right now:

  1. The x1 Gen 2.0 lane doesn't provide a ton of bandwidth.
  2. Most (all?) AAA games are compiled for X86 platforms, not for ARM/ARM64. Box86/Box64 would struggle, especially with a hacked-together graphics driver.
  3. Trying to sort through layers of incompatibilities with (a) ARM64, (b) Linux, and (c) a modified driver for an older unsupported GPU is a pretty crazy task.

I won't say never, but it's highly unlikely a Compute Module 4 will do anything outside of highly specialized external-GPU-related tasks—and that's assuming we haven't hit a dead end already.

What's next?

First, I'm sure many people reading this post had the following thoughts pop in their heads:

  • What about running Windows on Raspberry? After all, Windows is much better for gaming!
  • You should try powering the card differently; the CM4 IO Board can only supply up to 25W!
  • What about Rockchip or Allwinner SoCs? Or Apple's M1?

Well, in all three of the above cases, these avenues have been discussed and explored quite a bit, and let me assure you they are usually dead ends:

Windows: Windows on Raspberry runs Windows for ARM, which does not support ARM GPU drivers in any way I've seen so far. And Windows GPU drivers are also more obtuse than Linux drivers, meaning it would be harder for a random guy like me to debug them, anyways. And finally, it's unlikely Microsoft will patch around hardware PCIe bugs on the BCM2711 since they don't even support Windows on Raspberry Pi hardware anyways!

Power: The issues we're running into are not power-related, and I have also tested all the graphics cards in powered risers with beefy 650W and 750W PSUs.

Other SoCs: There are some ARM platforms that do support external GPUs, like some of Solid-Run's boards. But most ARM SoCs—especially ones that are targeted at mobile/embedded use—have a broken PCIe root complex (just like the Broadcom BCM2711) and run into very similar (if not identical) issues with things like graphics cards.

pgwipeout has been exploring some Rockchip SoCs' PCIe bus, and had this to say:

We already know BRCM doesn't care about complying with the spec, and their implementation is severely broken.

It seems the rk35xx series also doesn't comply, but not nearly as bad.

I'm trying to narrow down exactly how bad, so that I can document it as we finalize the PCIe support for mainline.

So far it seems like we're hitting similar dead ends with MMIO and PCIe memory management on all the SoCs on lower-cost SBCs—at least of the current generation.

My hope is future chips from Broadcom et all may have a better implementation, that could at least work with the latest generation of PCIe devices, and not require workarounds like avoiding writeq on 64-bit OSes.

As it stands, some devices (like newer LSI HBAs) can work around the issues with minimal performance penalties, while other devices (like graphics cards and Google's Coral TPU) seem to be crippled.

If you want to follow along on our journey (work continues! Coreforge is experimenting with Minecraft on the GPU currently...), please follow these issues on GitHub:

Comments

I know just enough to know how much work probably went into this, and I'm amazed you got this to work. Thank you for always inspiring me to not accept the limits of what I think is possible (within reason).

Just wow! After carefully reading through the entire article I have become convinced that the author (whom I greatly admire and respect) is a 9th Dan techno-masochist. I mean this in the nicest way possible.

no email address, we Comment...
Email Subject: Creative, I'd say so!
Email Text: In response to article:
External graphics cards work on the Raspberry Pi
--
Hui Jeff,
Thank you, just thought you should know I appreciate your efforts.
Should you ever seek a proof reader, editor, and second set of eyes, I am here for you. Remote is doable.
All good.

why can't we just re-configure the kernel instead of re-compiling?