A Sad Slow PPPoE Story

Table of Contents

When I deployed my first FreeBSD router in my home I had a few issues with PPPoE, specifically over igb(4) interfaces. I was scratching my head for a long time until I discovered this post in bugzilla and did some further research of my own to better understand the issue. I’m writing this post for anyone who may be experiencing the same problem.

TL;DR

This post asserts that this is probably a hardware issue, not a driver issue , igb(4) interfaces (in my case the Intel I210AT NIC) do not support per-queue load distribution for PPPoE traffic. Thus all incoming PPPoE traffic is queued along a single Rx queue on the NIC itself, causing a bottleneck.

If you are experiencing similar issues with slow PPPoE over an Intel interface consider checking the datasheet for your NIC and see what the RSS hash types are and what other policies the card may have for load distribution.

The problem statement

I have a UK ISP with a FTTP line which should have been serving me a 900Mpbs downlink, where my IP address and gateway are negotiated with the ISP over PPPoE:

  • Using an old Cisco 2901 enterprise router’s PPPoE features I was getting around 300Mbps
  • Using a ThinkServer RS140, userland PPPoE with ppp(8) I was achieving around 100Mpbs
  • Using a ThinkServer RS140, kernel space PPPoE with mpd5(8) I was getting the same as userland PPPoE

Multi-queue NICs and RSS

Multi-queue NICs provide multiple queues for sending and receiving traffic, each of these queues has an interrupt, and the NIC usually distributes packets to these queues based on some sort of hash function. The data in these queues are then routed to different CPUs in order to distribute the workload in processing them. When this distribution of interrupts is done in hardware it is called ‘receive side scaling’ (RSS)

The ThinkServer has three integrated network cards, two of which are based on the Intel I210AT NIC. As mentioned in the Intel I210 datasheet, this card implements RSS, as do many multi-queue Intel NICs. An RSS index is given to each received packet:

  • When a packet is received it is parsed
  • A hash calculation is performed on the packet yielding a 32-bit value
  • The 7 least significant bits of the resulting sum is used as an index into a 128-entry ‘indirection table’ where each entry is a 3-bit RSS output index.
  • Packets are routed to one of a set of Rx queues based on this index and other factors such as virtualisation.

So how are different packet types classed in RSS? Well, this model of card uses the Microsoft* RSS specification which lists different hash types based on whether the network-layer is IPv4 or IPv6 and if the transport-layer is UDP or TCP (the intel implementation of this spec also includes some additional hashes for UDP headers). Lets take a look at an extract from the I210 datasheet:

Inline Functions—Ethernet Controller I210  

7.1.2.10.1 RSS Hash Function  
Section 7.1.2.10.1 provides a verification suite used to validate that the hash function is computed  
according to Microsoft* nomenclature.  

The I210 hash function follows Microsoft* definition. A single hash function is defined with several  
variations for the following cases:  
	• TcpIPv4 — The I210 parses the packet to identify an IPv4 packet containing a TCP segment per the  
	criteria described later in this section. If the packet is not an IPv4 packet containing a TCP segment,  
	RSS is not done for the packet.  
	
	• IPv4 — The I210 parses the packet to identify an IPv4 packet. If the packet is not an IPv4 packet,  
	RSS is not done for the packet.  
	
	• TcpIPv6 — The I210 parses the packet to identify an IPv6 packet containing a TCP segment per the  
	criteria described later in this section. If the packet is not an IPv6 packet containing a TCP segment,  
	RSS is not done for the packet.  
	
	• TcpIPv6Ex — The I210 parses the packet to identify an IPv6 packet containing a TCP segment with  
	extensions per the criteria described later in this section. If the packet is not an IPv6 packet  
	containing a TCP segment, RSS is not done for the packet. Extension headers should be parsed for  
	a Home-Address-Option field (for source address) or the Routing-Header-Type-2 field (for  
	destination address).  
	
	• IPv6Ex — The I210 parses the packet to identify an IPv6 packet. Extension headers should be  
	parsed for a Home-Address-Option field (for source address) or the Routing-Header-Type-2 field  
	(for destination address). Note that the packet is not required to contain any of these extension  
	headers to be hashed by this function. In this case, the IPv6 hash is used. If the packet is not an  
	IPv6 packet, RSS is not done for the packet.  
	
	• IPv6 — The I210 parses the packet to identify an IPv6 packet. If the packet is not an IPv6 packet,  
	receive-side-scaling is not done for the packet.

The following additional cases are not part of the Microsoft* RSS specification:  
	• UdpIPV4 — The I210 parses the packet to identify a packet with UDP over IPv4.  
	• UdpIPV6 — The I210 parses the packet to identify a packet with UDP over IPv6.  
	• UdpIPV6Ex — The I210 parses the packet to identify a packet with UDP over IPv6 with extensions.

...
...
...

The following combinations are currently supported:  
	• Any combination of IPv4, TcpIPv4, and UdpIPv4.  
	• And/or.  
	• Any combination of either IPv6, TcpIPv6, and UdpIPv6 or IPv6Ex, TcpIPv6Ex, and UdpIPv6Ex.  

When a packet cannot be parsed by the previously mentioned rules, it is assigned an RSS output index  
= zero. The 32-bit tag (normally a result of the hash function) equals zero.  
The 32-bit result of the hash computation is written into the packet descriptor and also provides an  
index into the indirection table.

Because PPPoE traffic does not fit any of the hash types it is not hashed, and assigned an RSS index of 0, and thus all PPPoE traffic is destined for the same queue, causing a bottleneck. And so this is not an issue with the igb(4) driver but the actual hardware of the NIC itself not supporting PPPoE per-queue distribution.

Solution

The solution was sadly to switch hardware. Now my firewall is running on a fantastic x86 board, the ODROID-H3+, which sports two Realtek RTL8125B 2.5GbE LAN ports (There is a driver, realtek-re-kmod, available in the ports tree to get these working in FreeBSD 13.1).

However the datasheet for RTL8125B says that this NIC also makes use of Microsoft RSS, specifically NDIS 6.0 RSS. The document is not as detailed as the Intel I210 datasheet and does not specify how packets that do not meet a criteria for one of the hash types are handled. Also, the realtek-re-kmod driver does not seem to interact with any RSS features on the card (see source code) where as the igb(4) driver does (see source code).

Using the Realtek NICs I achieved:

  • Using userland PPPoE with ppp I got around 300Mbps down
  • Finally using kernel space PPPoE with mpd5 I got the 900Mbps down I always wanted :)