Difference between revisions of "EJFAT UDP General Performance Considerations"
Jump to navigation
Jump to search
(10 intermediate revisions by the same user not shown) | |||
Line 77: | Line 77: | ||
− | === Effect of packet receiving functions === | + | === Effect of packet sending & receiving functions === |
+ | |||
+ | : Using '''recvmmsg''' (found only in linux) in order to receive arrays of UDP packets in one system call appears to be able to possibly speeding things up; however, in practice it only slows things down. (Here's tip for programming with recvmmsg, set the timeout to nullptr or the function only returns 1 packet at a time! Set flag = MSG_WAITALL). | ||
+ | |||
+ | : Use '''recv''', it's much faster. | ||
+ | |||
+ | |||
+ | === Effect of socket parameters === | ||
+ | |||
+ | ==== Sending ==== | ||
+ | |||
+ | : Call '''connect()''' to "connect" the sending side of the socket to the receiver, allowing use of the '''send()''' call. An ordinary UDP socket doesn't know anything about its future destinations, so it performs a route lookup each time sendmsg() or sendto() is called. However, if connect() is called beforehand with a particular remote receiver's IP and port, the operating system kernel will be able to write down the reference to the route and assign it to the socket, making it significantly faster to send a message if subsequent sendmsg/sendto/send calls do not specify a receiver. | ||
+ | |||
+ | : The downside of using connect and send is that the receiver must be up and running or the sender cannot connect and no packets can be sent. | ||
+ | |||
+ | ==== Receiving ==== | ||
+ | |||
+ | |||
+ | |||
− | |||
</font> | </font> |
Latest revision as of 22:05, 2 January 2024
Here are a few things to ponder. I'll go over some things I've done to try to speed up performance so that those who follow won't waste their time. Here are some interesting links:
NIC queues on multi-cpu nodes
- Contemporary NICs support multiple receive and transmit descriptor queues (Receive Side Scaling or RSS). On reception a NIC distributes packets by applying a filter to each that assigns it to one of a number of logical flows. Packets for each flow are steered to a separate receive queue, which in turn can be processed by a separate CPU. The goal of this is to increase performance. Find out how many NIC queues there are on your node by looking at the combined property:
// See how many queues there are sudo ethtool -l enp193s0f1np1 // See how big the queue sizes are sudo ethtool -g enp193s0f1np1 // Make the receiving queues the max size sudo ethtool -G enp193s0f1np1 rx 8192
- The filter used is typically a hash function over the network and/or transport layer headers. Typically and for ejfat nodes this is a 4-tuple hash over IP addresses and ports of a packet. The most common implementation uses an indirection table (256 entries for ejfat nodes) where each entry stores a queue number. The receive queue for a packet is determined by masking out the low order seven bits of the computed hash for the packet (usually a Toeplitz hash), taking this number as a key into the indirection table and reading the corresponding value.
// See if hashing is enabled sudo ethtool -k enp193s0f1np1 | grep hashing // Print out the indirection table to see how packets are distributed to Qs sudo ethtool -x enp193s0f1np1
- It's also possible to steer packets by modifying the hash being used:
// See the details of the hash algorithm sudo ethtool -n enp193s0f1np1 rx-flow-hash udp4 // Change hashing to only destination port (slows things down if using 63 queues) sudo ethtool -N enp193s0f1np1 rx-flow-hash n // Change hashing to back to 4-tuple sudo ethtool -N enp193s0f1np1 rx-flow-hash sdfn
- There are other filters that can be specified, rules on which packets go to which queues. For example packets destined for a specific port can be sent to a fixed queue:
// send port 17750 UDP IPv4 packets to queue #7 sudo ethtool -N enp193s0f1np1 flow-type udp4 dst-port 17750 queue 7
Effect of NIC queues on UDP transmission
Change # of queues
- Changing the number of queues has a major effect on performance. To start with, let's look at one packet sender and one receiver.
Changing hash and queue size
- Changing the hash algorithm and queue sizes made little difference in performance.
Effect of packet sending & receiving functions
- Using recvmmsg (found only in linux) in order to receive arrays of UDP packets in one system call appears to be able to possibly speeding things up; however, in practice it only slows things down. (Here's tip for programming with recvmmsg, set the timeout to nullptr or the function only returns 1 packet at a time! Set flag = MSG_WAITALL).
- Use recv, it's much faster.
Effect of socket parameters
Sending
- Call connect() to "connect" the sending side of the socket to the receiver, allowing use of the send() call. An ordinary UDP socket doesn't know anything about its future destinations, so it performs a route lookup each time sendmsg() or sendto() is called. However, if connect() is called beforehand with a particular remote receiver's IP and port, the operating system kernel will be able to write down the reference to the route and assign it to the socket, making it significantly faster to send a message if subsequent sendmsg/sendto/send calls do not specify a receiver.
- The downside of using connect and send is that the receiver must be up and running or the sender cannot connect and no packets can be sent.
Receiving