Difference between revisions of "How to install, build and use XDP related packages"

From epsciwiki
Jump to navigation Jump to search
Line 257: Line 257:
 
: Before we can actually run something meaningful, we'll need to get and build 3 different GitHub repos:
 
: Before we can actually run something meaningful, we'll need to get and build 3 different GitHub repos:
  
# The disruptor repo which gives us altrafast, lock-free, blocking, circular buffers.
+
# The disruptor repo which gives us ultrafast, lock-free, blocking, circular buffers.
 
# The ejfat repo which gives us:
 
# The ejfat repo which gives us:
 
#* an application to send properly packetized data, and
 
#* an application to send properly packetized data, and
#* an utility library based on the disruptor
+
#* a utility library based on the disruptor
 
# The ejfat-xdp repo which gives us:
 
# The ejfat-xdp repo which gives us:
 
#* an application to reassemble packetized data, and
 
#* an application to reassemble packetized data, and

Revision as of 20:56, 7 December 2023

PAGE UNDER CONSTRUCTION


Getting Started

XDP stands for eXpress Data Path, and eBPF or BPF stands for extended Berkeley Data Filter
Following are links to a few good places to start learning to program with XDP sockets:
  • The best place to learn to program is the tutorial:
XDP tutorial
  • Helpful sites:
Beginner's Guide to XDP and BPF
Overview of XDP Sockets
RedHat XDP Page


Get and install the XPD/BPF related files

There are 2 main libraries that are needed to use XDP sockets: the libxdp library and libbpf library upon which it depends. Although one can load the 2 from separate packages, that is not recommended as this software is changing so quickly that you'll need versions of the 2 which are compatible. I believe the best option is to use the xdp-tools GitHub repository which has compatible versions of both. The difficulty is that the xdp-tools makefiles are not setup to install libbpf so some custom changes (quite minimal) are needed to be able to do this. For stability's sake I have forked the repo and made all the necessary modifications.


Links

Future advancements/versions in XDP/BPF will mean that this will need to be redone at some point, so here is a note of what was done to make things compile and install:
xdp-tools repository modifications


Following are links to the xdp-tools repos:
Jefferson Lab forked version of xdp-tools (changes to makefiles, etc)
Original xdp-tools repo


Get the GitHub repo

export PREFIX=""
git clone --recurse-submodules https://github.com/JeffersonLab/xdp-tools.git
cd xdp-tools


Host Dependencies

Before this code can be compiled, you must follow the proper setup procedure to address its dependencies.
Setup instructions are at given in the tutorial, XDP tutorial.
Go to the setup_dependencies.org link at Setup Dependencies
However, if you want to avoid wading through that, it boils down to:
// (to get bpftool)
sudo apt install linux-tools-common linux-tools-generic
// to get this to build
sudo apt install linux-tools-5.15.0-87-generic
sudo apt install clang llvm libpcap-dev build-essential
sudo apt install linux-headers-$(uname -r)

// xdp-tools needs emacs
sudo apt install emacs

// you will need to use clang 11 for this to work so install and set commands to this version
sudo apt install clang-11 clang-format-11
sudo update-alternatives --install /usr/bin/clang clang /usr/bin/clang-11 100
sudo update-alternatives --install /usr/bin/clang++ clang++ /usr/bin/clang++-11 100
sudo update-alternatives --install /usr/bin/clang-format clang-format /usr/bin/clang-format-11 100
sudo update-alternatives --install /usr/bin/llc llc /usr/bin/llc-11 100

// check to see if this worked by doing
ls -al /usr/bin/clang*
ls -al /etc/alternatives/clang*
ls -al /usr/bin/llc*
ls -al /etc/alternatives/llc*

// now one can do
./configure
make

// for installation

// make sure there is an ending slash "/" on your install dir !!
export DESTDIR=<install dir>/
export LIBDIR=lib
export HDRDIR=include
export MANDIR=share
export SBINDIR=bin
export SCRIPTSDIR=scripts
make install
The above installation will make and install the xdp-loader program into <install dir>/bin.
It can be used (see below) to both load/unload programs and query what programs have been loaded.


Getting ready to use XDP sockets

  • Each ejfat node has a Mellanox ConnectX-6 Dx NIC which can handle 2x100Gbps or 1x200Gbps.
  • The interface name corresponding to this card is enp193s0f1np1. If yours is different, substitute it.
  • Avoid running XDP code in the skb (generic) mode in which the linux stack is NOT bypassed.
  • Use the XDP native mode in which the linux network stack is bypassed by placing special code in the kernel's NIC driver.
To do this, the NIC's MTU must not be larger than 1 linux page minus some headers.
On the ejfat nodes the max MTU which still allows native mode is 3498.
sudo ifconfig enp193s0f1np1 mtu 3498


NIC queues

Now a note on how recent linux NIC drivers use multiple queues to hold incoming packets (for details see NIC Queues).


Number of queues

Contemporary NICs support multiple receive and transmit descriptor queues. On reception, a NIC can send different packets to different queues to distribute processing among CPUs. Find out how many NIC queues there are on your node by looking at the combined property:
// See how many queues there are 
sudo ethanol -l enp193s0f1np1
In the case of ejfat nodes, there are a max of 63 queues even though there are 128 cores. It seems odd to me that there isn't 1 queue per cpu, and it does not appear to be changeable so most likely it's built into the kernel when first created.


Distribution of packets to queues

The NIC typically distributes packets by applying a 4-tuple hash over IP addresses and ports of a packet. The indirection table of the NIC, which resolves a specific queue by this hash, is programmed by the driver at initialization. The default mapping is to distribute the queues evenly in the table, but the indirection table can be retrieved and modified at runtime using ethtool commands (-x and -X).
So to see which queue a hash entry maps to by default:
// look at the default mapping of hash keys to queues
sudo ethtool -x enp193s0f1np1
You'll see an even spread of keys over the 63 queues. Now, funnel all the incoming packets into 1 queue (queue #0) so that 1 socket can receive all packets and redo the above command:
// Use only queue #0
sudo ethtool -L enp193s0f1np1 combined 1

// Check status of combining queues
sudo ethtool -L enp193s0f1np1

// look at the new mapping of hash keys to queues
sudo ethtool -x enp193s0f1np1
This time you'll see that every entry points to queue #0.
Undo this with:
sudo ethtool -L enp193s0f1np1 combined 63
Proceed by finding exactly which NIC driver you have:
sudo ethtool -i enp193s0f1np1
The ejfat nodes have a quirky Mellanox NIC driver (mlx5), which leads us to the following topic.


Queues & the Mellanox NIC driver in zero-copy mode

For the Mellanox driver, queues are treated in a unique way when it comes to achieving peak performance by using its zero-copy capabilities. For general info, look at the following from an XDP Overview. Following is a short excerpt:
XDP_COPY and XDP_ZEROCOPY bind flags

When you bind to a socket, the kernel will first try to use zero-copy copy. If zero-copy is not supported, it will fall back on using copy mode, i.e. copying all packets out to user space. But if you would like to force a certain mode, you can use the following flags. If you pass the XDP_COPY flag to the bind call, the kernel will force the socket into copy mode. If it cannot use copy mode, the bind call will fail with an error. Conversely, the XDP_ZEROCOPY flag will force the socket into zero-copy mode or fail.


At first try, when using the XDP_ZEROCOPY flag when binding the XDP socket, it appears that for ejfat nodes, zero-copy mode does not work. However, investigation reveals a quirk in the Mellanox NIC driver. (See Secret to zero-copy with Mellanox NIC driver). Here is an excerpt:
The mlx5 driver uses special queue ids for zero-copy. If N is the number of
configured queues, then for XDP_ZEROCOPY the queue ids start at N. So
queue ids [0..N) can only be used with XDP_COPY and queue ids [N..2N)
can only be used with XDP_ZEROCOPY.


For ejfat nodes, the number of queues cannot be increased and the maximum remains fixed at 63. Trying to use queue #64 and higher gives an error. The only solution is cut the number of queues in half to 32. Then use queues #32 - #63 for zero copy queues. This seems to work:
// Use only 32 queues
sudo ethtool -L enp193s0f1np1 combined 32
At this point queues #0 - #31 will copy incoming data, and queues #32 - #63 are zero-copy.


Multiple data sources & special queue rules

With multiple data sources, each destined for a separate socket, we would ideally prefer all packets for 1 socket to end up in the same single queue. Fortunately for us, there are rules which can be setup to direct packets to different queues depending on a variety of factors. Here we take advantage of being able to direct UDP packets destined for a known port to be sent to a single queue.
Say, for example, we have 3 data sources (ids 3,5,9), with packets destined for ports 17750, 17751, and 17752. If we want them to be directed to 3 different, zero-copy queues, then the following could be done to send port 17750 traffic to queue #32, port 17751 to queue #33, and port 17752 to queue #34:
// send port 17750 UDP IPv4 packets to queue #32 (first zero-copy queue)
sudo ethtool -N enp193s0f1np1 flow-type udp4 dst-port 17750 queue 32

// send port 17751 UDP IPv4 packets to queue #33
sudo ethtool -N enp193s0f1np1 flow-type udp4 dst-port 17751 queue 33

// send port 17752 UDP IPv4 packets to queue #34
sudo ethtool -N enp193s0f1np1 flow-type udp4 dst-port 17752 queue 34


Here are a couple of commands to administer such rules:
// Show all flow rules
sudo ethtool -n enp193s0f1np1

// Delete rule (rule numbers seen with above command)
sudo ethtool -N enp193s0f1np1 delete <rule #>


Get, make, install, load and run EJFAT-related XDP software

Before we can actually run something meaningful, we'll need to get and build 3 different GitHub repos:
  1. The disruptor repo which gives us ultrafast, lock-free, blocking, circular buffers.
  2. The ejfat repo which gives us:
    • an application to send properly packetized data, and
    • a utility library based on the disruptor
  3. The ejfat-xdp repo which gives us:
    • an application to reassemble packetized data, and
    • code to load into the NIC driver


git clone https://github.com/JeffersonLab/ejfat.git
cd ejfat
mkdir build
cd build
cmake .. -DBUILD_DIS=1
make install
Now we get to the point of actually needing some software to run something meaningful. The software in the ejfat-xdp GitHub repo produces programs which receive ejfat UDP packets and reconstructs them into their original events. There are actually 2 programs which must be run at the same time. The first is the special C code which is loaded into the NIC driver and directs IPv4 UDP packets to one of possibly several XDP sockets (xdp_kern.o). The second is the user space program which receives these UDP packets directed to the XDP sockets it creates and reconstructs them into events. The user code is written in such a way as to make these events available to other parts (threads) of the process.
git clone https://github.com/JeffersonLab/ejfat-xdp.git
cd ejfat-xdp
mkdir build
cd build
cmake ..
make install


Loading our special code into the NIC driver can be done in a number of different ways.
The following is just one way of those ways. The code was compiled in the ejfat-xdp repo and stored in
.../ejfat-xdp/build/bin/xdp_kern.o
Just for fun, practice loading it by hand into the NIC driver and checking to see if it succeeded:
// Load the kernel NIC driver code
sudo <xdp_install_dir>/bin/xdp-loader load -m native enp193s0f1np1 xdp_kern.o

// Check the NIC to see if code really loaded and in what mode
sudo /daqfs/xdp/xdp-tools/xdp-loader/xdp-loader status

// Remove everything just loaded
sudo /daqfs/xdp/xdp-tools/xdp-loader/xdp-loader unload enp193s0f1np1 --all


The way most users will do the loading is to run the user code which will do it all for them. It will also unload the kernel code when the user program is killed by control-C:
// Run a user program which loads the special code into the NIC driver and then receives packets:
.../ejfat-xdp/build/bin/xdp_user_mt

// Check the NIC to see if code really loaded and in what mode
sudo /daqfs/xdp/xdp-tools/xdp-loader/xdp-loader status


Run a test

Assuming we're starting from a blank slate:
// Run a user program which loads the special code into the NIC driver and then receives packets:
.../ejfat-xdp/build/bin/xdp_user_mt

// Check the NIC to see if code really loaded and in what mode
sudo /daqfs/xdp/xdp-tools/xdp-loader/xdp-loader status