Difference between revisions of "EJFAT UDP General Performance Considerations"
Jump to navigation
Jump to search
Line 9: | Line 9: | ||
=== NIC queues on multi-cpu nodes === | === NIC queues on multi-cpu nodes === | ||
− | : Contemporary NICs support multiple receive and transmit descriptor queues (Receive Side Scaling or RSS). On reception a NIC distributes packets by applying a filter to each that assigns it to one of a number of logical flows. Packets for each flow are steered to a separate receive queue, which in turn can be processed by a separate CPU. The goal of this is to increase performance. | + | : Contemporary NICs support multiple receive and transmit descriptor queues (Receive Side Scaling or RSS). On reception a NIC distributes packets by applying a filter to each that assigns it to one of a number of logical flows. Packets for each flow are steered to a separate receive queue, which in turn can be processed by a separate CPU. The goal of this is to increase performance. Find out how many NIC queues there are on your node by looking at the '''combined''' property: |
+ | |||
+ | <blockquote> | ||
+ | <pre> | ||
+ | // See how many queues there are | ||
+ | sudo ethtool -l enp193s0f1np1 | ||
+ | </pre> | ||
+ | </blockquote> | ||
: The filter used is typically a hash function over the network and/or transport layer headers. Typically and for ejfat nodes this is a 4-tuple hash over IP addresses and ports of a packet. The most common implementation uses an indirection table (256 entries for ejfat nodes) where each entry stores a queue number. The receive queue for a packet is determined by masking out the low order seven bits of the computed hash for the packet (usually a Toeplitz hash), taking this number as a key into the indirection table and reading the corresponding value. | : The filter used is typically a hash function over the network and/or transport layer headers. Typically and for ejfat nodes this is a 4-tuple hash over IP addresses and ports of a packet. The most common implementation uses an indirection table (256 entries for ejfat nodes) where each entry stores a queue number. The receive queue for a packet is determined by masking out the low order seven bits of the computed hash for the packet (usually a Toeplitz hash), taking this number as a key into the indirection table and reading the corresponding value. | ||
Line 38: | Line 45: | ||
</blockquote> | </blockquote> | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
=== Effect of NIC queues on UDP transmission === | === Effect of NIC queues on UDP transmission === |
Revision as of 22:07, 21 December 2023
Here are a few things to ponder. I'll go over some things I've done to try to speed up performance so that those who follow won't waste their time. Here are some interesting links:
NIC queues on multi-cpu nodes
- Contemporary NICs support multiple receive and transmit descriptor queues (Receive Side Scaling or RSS). On reception a NIC distributes packets by applying a filter to each that assigns it to one of a number of logical flows. Packets for each flow are steered to a separate receive queue, which in turn can be processed by a separate CPU. The goal of this is to increase performance. Find out how many NIC queues there are on your node by looking at the combined property:
// See how many queues there are sudo ethtool -l enp193s0f1np1
- The filter used is typically a hash function over the network and/or transport layer headers. Typically and for ejfat nodes this is a 4-tuple hash over IP addresses and ports of a packet. The most common implementation uses an indirection table (256 entries for ejfat nodes) where each entry stores a queue number. The receive queue for a packet is determined by masking out the low order seven bits of the computed hash for the packet (usually a Toeplitz hash), taking this number as a key into the indirection table and reading the corresponding value.
// See if hashing is enabled sudo ethtool -k enp193s0f1np1 | grep hashing // Print out the indirection table to see how packets are distributed to Qs sudo ethtool -x enp193s0f1np1
- It's also possible to steer packets to queues based on other selectable filters. To find out which filter is being used:
// See the details of the hash algorithm sudo ethtool -n enp193s0f1np1 rx-flow-hash udp4 // Change hashing to only destination port (slows things down if using 63 queues) sudo ethtool -N enp193s0f1np1 rx-flow-hash n // Change hashing to back to 4-tuple sudo ethtool -N enp193s0f1np1 rx-flow-hash sdfn
Effect of NIC queues on UDP transmission
- In the case of ejfat nodes, there are a max of 63 queues even though there are 128 cores. It seems odd to me that there isn't 1 queue per cpu, and it does not appear to be changeable so most likely it's built into the kernel when first created.
Jumbo Frames