Ethernet was slower only in one direction on one device

MikroTik Cloud Switch Router with FlyPro Fiber Copper RJ-45 10G Transceiver

There could be a thousand reasons something like this happens, but here's my situation:

A few months ago, when I was testing 10 Gigabit networking on the Raspberry Pi, I noticed something funny when I was doing iperf3 speed tests between devices connected through my MikroTik CRS305 10G switch.

The switch uses SFP+ jacks, and so I was testing a variety of connections—SFP+ to duplex fiber, SFP+ to RJ-45 copper, and SFP+ DAC (Direct Attach Cable) cables. One thing I found very strange was that sometimes, I would get in a situation where a particular device (Pi, my Mac, a Windows PC, or even my NAS) would get the full expected speed in one direction (e.g. my Mac to the NAS), but then if I tested it in reverse (NAS to my Mac), I would get horrible speeds—but only if I was using a 2.5 Gbps NIC on one end.

In the case of the NAS, which has 2.5G Ethernet, I was getting between 50 and 500 Mbps to it, with no consistency in the throughput.

I tested with multiple NICs, I tested both ports of my NAS, and I tested with multiple known-working-to-10G Cat6 patch cables.

And in every case, I was getting this asymmetric speed.

Eventually I bought a QNAP QSW-M2108-2C switch instead, which has 8 built-in 2.5G RJ-45 jacks, and 2 dual-mode SFP+/Copper 10G interfaces.

I plugged everything into it, forgoing fiber or DACs, and the problem went away.

So I shelved the MikroTik switch for the time being.

More MikroTik, More Problems?

Then a few weeks ago, a viewer of my YouTube channel offered me a MikroTik CRS309 switch for a song, so I had to take him up on the offer. I swapped it into my rack, and... exact same issue!

Thinking there's no possible way both MikroTik units could make one port work poorly at random, I finally ordered a new batch of RJ-45 transceivers, and started swapping them out (after going through the whack-a-mole with cables and devices to no avail).

And I found, after all that testing, that the FLYPROFiber SFP-10G-T-30M transceivers—at least some revisions—had trouble when a device negotiated 2.5G speeds through it.

FLYPRO Fiber copper transceiver shows as multi-mode fiber

As can be seen above, the FLYPRO transceivers were being identified as "multi-mode fiber", which also seemed suspect, since they are, indeed, copper RJ45 transceivers!

When connected through one of these, I'd get 2.3 Gbps one way, but the other direction would bottleneck at less than 500 Mbps, which was really horrible for editing video and backups to my NAS.

It was the Transceiver, dummy!

So yeah, when you're diagnosing network problems, don't leave any stone unturned. In my case, I was using a transceiver that I'd tested and had working with 2.5 and 10G devices, but in some cases, for some reason, it would only work one way.

Switching to MikroTik's own S+RJ10 transceiver resulted in full 2.3 Gbps bidirectional throughput.

I'm reminded of a time earlier this year when one of my Macs was getting 100 Mbps sometimes, and 10 Gbps others, seemingly depending on the direction the wind was blowing. In that case, it turns out one of the 8 wires in one keystone jack was rubbing against the shielded keystone casing, and causing the entire cable run to downrate to 100 Mbps... but only sometimes.

Always start with the patch cables. Then look at your equipment. Then check your drivers and device CPU/statistics. Then check transceivers, then jacks, terminations, and finally cabling. Networking can be... fun.

While this may not have been the direct cause, engrpiman2's post on Ars Technica was what finally jostled my brain to think to swap transceivers. Not everything that's 10G will also work well with 2.5G and 5GBASE-T devices. Lesson learned.

Comments

I want to just leave a tip regarding smoke testing the network path. If you don't care about the TCP performance of the stack on both ends, it's simplest to run iperf in UDP mode, so you can push as much traffic through the network as your endpoints can. Not having a flow control mechanism like TCP, the UDP flow will hit every interface and if you suspect a particular link, you will see a difference in the bitrate. Without drop counters, using TCP hides this link by link diagnostic tool as the entire path in one direction will be the speed capable according to the lowest common denominator, even if it's just a link that drops a packet from time to time.

Good point. You can also change the TCP window size and that can help pinpoint some weird conditions. There's a lot to iperf/iperf3 that I am still pretty green with.

Hi Jeff,
I'm (completely unnecessarily) upgrading from a Ryzen 5 4600U Chinese mini PC to another with a Ryzen 7 5900H. The reason I'm here is that it has 2 2.5 Gbps NICs, so I naturally need to make a use of them, right? :) (Just when I significantly improved my network with a D-Link 24port managed PoE switch...) Anyway, I'm looking to you for tips on 2.5 Gbps+ switches, and this Mikrotik looks great! Quite reasonably priced for a managed 10G switch.

Anyway, thanks a lot for all the ridiculous fun projects! Pity it took me so long to find your channel :)

P.S.: The first link to the Mikrotik switch incorrectly shows/links to CRS305, which is a 5-port Gigabit switch. Might want to fix that :)

The first link to the Mikrotik switch incorrectly shows/links to CRS305

That was the one I originally used (and still have as a spare for testing), so the link/text is correct there ;)

Ah, just read the post properly. I see it's correct! Missed that the small 305 is also a 10G switch! Sorry about that :) Please ignore the previous comment.

Cheers Pete

There is a logical explanation.

Internally your switch has an ethernet MAC connected to the SFP+ transceiver via differential pairs (commonly called serdes). The standard speeds for serdes are 1Gbit/s or 10Gbit/s.

If you use a fiber transceiver, and connect it to another switch the same way, then both switches will run auto-negotiation through the fiber to establish a working 1Gbit/s or 10Gbit/s link.

If you have a transceiver with a copper (RJ45) PHY, it's completely different. The serdes is still there, but there are actually two separate links. One link between the switch ethernet MAC and the "serdes side" of the RJ45 PHY inside the transceiver, and the other link between both RJ45 PHYs at each side of the RJ45 cable:

[MAC1] ----serdes ---- [RJ45 PHY1]  ------ copper -----  [RJ45 PHY2]

What happens if both links don't have the same speed ?

For example if RJ45 auto-negociates to 100Mbit/s or 10Mbit/s, then there is an issue because there is no standard way to run the serdes link at a speed lower than 1Gbit/s. So the switch would send data at 1Gbit/s, and inside your transceiver the same data would have to be sent over a 100Mbit/s copper link. The only way for this to work at all is to have buffering to absorb the rate change.

Buffering needs RAM, which is expensive and not something you want to put inside a hopefully cheap transceiver, so this was not the chosen solution back in 2004 when the problem appeared. It was solved by Cisco proprietary standard SGMII. This standard adds a special mode on the serdes link, which will still be running physically at 1Gbit/s, but will repeat the data either 10 or 100 times to emulate 100 and 10Mbit/s speed.

With SGMII, the ethernet MAC is now artificially throttled to send data on the 1 Gbit/s serdes at the same rate than the RJ45 link, so buffering takes place inside the switch where it should be (because switches do have buffers).

Fast forward in 2020, and the problem is there again with 2.5Gbit/s or 5Gbit/s capable RJ45 transceivers:

  1. "Old" serdes can only run at 1Gbit/s or 10Gbit/s. For example an Intel x520 NIC will always report a link speed of 10Gbit/s if you use such a transceiver, and rightfully so because that's the truth.

    But since the copper link runs at 2.5Gbit/s the transceiver will try to cope with link speed difference by buffering and/or using pause frames to the switch. They don't usually have enough memory to absorb large 10Gbit/s burst coming from the switch, so packet loss occurs.

  2. For recent serdes hardware, Cisco is back to rescue us again with the new USXGMII standard. Same purpose as SGMII, match the serdes speed with copper speed. Both sides have to support it.

What's happening in your case:

  • with your Mikrotik "certified" SFP, USXGMII or a proprietary alternative is used, and the serdes speed matches the copper speed
    • => good performance
  • with the other SFP, the switch is not able to infer the correct speed and just run the serdes at 10Gbit/s. Rate conversion inside transceiver cannot cope with packet bursts.
    • => packet loss