How many AMD RX 7900 XTX's are defective?

I'm working on a project—a very dumb project, mind you—and I was trying to acquire the two current-gen flagship GPUs: an Nvidia RTX 4090, and an AMD Radeon RX 7900 XTX.

In some weird stroke of luck (it has been difficult to find either in stock), I was able to get one of each this week.

AMD RX 7900 XTX Reference Sapphire edition versus Nvidia RTX 4090 Gigabyte boxes

(lol at the size difference...)

Besides exorbitant price gouging, Nvidia's ownership of the crown in terms of GPU performance remains in this generation, as the 4090 blows past any competing card so far. But AMD's 7900 XTX was poised to be the best value in terms of price, performance, and efficiency (at least compared to any Nvidia offering).

Unfortunately, many have experienced unexpected overheating on the 7900 XTX reference design, and der8auer even went so far as to rip open a vapor chamber to get to the bottom of the issue.

AMD originally stated that 110°C junction temps were normal, but seemed to back off that statement as more evidence (like der8auer's videos) came to light demonstrating inadequate cooling only in a particular orientation.

When I got my 7900 XTX in the mail, I decided to run some tests. Light gaming on my 1080p monitor didn't seem to hit it too hard, but the card was getting pretty loud as the fans ramped up.

MSI Kombustor GPU stress test 110 degrees C AMD RX 7900 XTX

I then ran MSI Kombustor's furmark donut test, which basically pushes the GPU to 100%. After less than 7 minutes, my card's 'junction temperature' hit 110°C, and I saw the GPU clocks and power draw drop about 5-7% as the poor fans struggled to keep up, ramping up to a loud 2900 rpm:

MSI Kombustor GPU stress test 110 degrees C AMD RX 7900 XTX - closeup of stats

This testing was done with the card installed in my PC in the horizontal orientation—that is, I have a tower PC, and the motherboard is vertical, with the 7900 XTX installed so the fans are parallel to the ground.

It seems that for a bit of time, the cooler can keep up, but after 5-7 minutes, there's just not enough cooling capacity in this orientation, as the vapor chamber literally runs out of steam. (Or, well... water, I guess.)

So I shut down my computer, laid it on its side (so the graphics card was in the vertical orientation), and ran the tests again. This time, the fans were a quieter 1700 rpm, the max junction temperature was 92°C, and there was no throttling or power reduction for over 15 minutes:

MSI Kombustor GPU stress test 110 degrees C AMD RX 7900 XTX - PC horizontal, card vertical

To be complete, I shut down the computer and re-tested in the vertical orientation, then again in horizontal, and the results were identical. With my PC in it's typical upright configuration, I could only get about 5 minutes of the 7900 XTX's full performance before it started throttling (and very loudly, at that!). If I lay my PC on its side, I could get full performance all day (with about half the noise).

Something was definitely wrong with the vapor chamber.

AMD said "customers experiencing this unexpected limitation should contact AMD support", but if you head to the support page, and call the US support phone number, it directs you to the warranty claims page on the website. On that page, it guides you through a wizard and determines if you didn't buy the AMD card from AMD.com itself, you have to contact the partner manufacturer (even though in this case it's the reference design, just packaged by Sapphire).

So I contacted Sapphire support, and got the following:

This is a known issue for this card, you may contact your retailer for return and purchase a different version of the RX7900 series, Sapphire offers Pulse and Nitro series which do not exhibits this issue.

I asked if there was any way they could service the card or replace it, or even offer a paid upgrade to one of their own partner boards (like the aforementioned Pulse or Nitro). They responded:

we do not handle that type of request here, only support and warranty service. Please continue to check retailers site for availability.

So off to NewEgg the return goes!

Jeff holding RX 7900 XTX Sapphire reference board box

I guess I'll be sticking with team green this round.

After seeing AMD's representative stating:

We believe the issue is related to the thermal solution used in AMD’s reference design and is occurring in a limited number of cards sold.

...why are they not pulling cards off the shelves (and instructing partners to do so), if they know it is 'limited' in scope? If they truly know that, they should be able to narrow down the batch to a set of serial numbers that they could allow customers to check.

But if they don't actually know the scope of the problem—or if its a design flaw affecting all reference models—they should consider pulling all stock until they're fixed.

And maybe don't pass the buck to Sapphire, PowerColor, and other partners who have to give their customers a bad experience, since no comparable replacement cards can be had without paying scalpers on eBay.

Comments

> I'm working on a project—a very dumb project, mind you

You're putting a 4090 on a raspberry pi aren't you?

That should be something to see with the Rock Pi 5B's PCIe 3.0 interface.

It's starting not to get dumb anymore at PCIe 3.0 speeds with these types of boards. Yes, it certainly would make for a ridiculous gaming rig but think about it for GPU compute. If you are using the GPUs for computing tasks this makes perfect sense.

It would have saved me a lot of money if it would have been possible to use single board computers to control GPU mining rigs than having to build basically a full ATX build that generally sat idle (or mined Monero for some people). I actually got my original 2 Raspberry Pi Model B's to control mining ASICs. I was already controlling mining/compute rigs with Pis since the Pi 1 and it would still have made sense to do it during this last mining bubble.

Fascinating work, I'll keep an eye out for it!

Just acquired an AMD 6900 XT (December 2022) after years of running my GTX1080.
I am already trying to sell it online.

Crashes:
Black screens. Computer stuck.

Driver crashes:
AMD driver crashes and...catches the error and tells you that it crashes.
Yay.

And worst of all coil whining:
Constant high pitch noise when playing games.

Solution?
Reinstalled my GTX1080 in my computer -> runs with no issue.

Just in case: I don't think it comes from my PSU, I have a Seasonic PX-750 80+ Platinium.

AMD graphic cards: never again.

Did you fully clean out the Nvidia drivers before installing the AMD card? Not simply uninstalling them but using a cleaner tool as both being present is known to cause issues. The coil whine though is a known thing with reference 6900s (though I have seen speculation that it might actually be the fans - not the inductors that make the noise as some have found that adjusting the fan curve helps. I don't have one of these so I can't say for sure)

I agree what Jon said. I used ATi/AMD cards since 2009, and did not run into any problems, except for my late RX480, which has faulty capacitors after 6 years of use.
The card was an MSI gaming x, and I switched to 6650XT MSI gaming x. It is not very powerful, but with adequate case ventilation, the GPU fans don't spin at all, due to the low temperatures. Buying aftermarket GPUs are a better option.

It's the PSU.

The driver crashing and recovering is actually good, it's a sign that the issue was simply stability at the time. It's a well known (and good) guide point when OCing.

The 6900xt stock draws more than your 1080 ever did. And with an OC the 6900 can easily pull 350 to 400 w to the card.

Try running with a quality 900-1kw PSU, and if stability is still an issue at stock settings add a small bit of power limit or slightly reduce clocks and then RMA if it won't hold stock.

But running a card that best case pulls 289w out of your 750 max is pushing the PSU hard and not accounting for power spikes that will cause the instability you are seeing.

Are you adding a 4090 to a raspberry pi like device again?

I briefly toyed with this idea for a different SBC but drivers seem to be the issue.