Difference between revisions of "How to install, build and use XDP related packages"

From epsciwiki
Jump to navigation Jump to search
Line 118: Line 118:
  
 
: Contemporary NICs support multiple receive and transmit descriptor queues. On reception, a NIC can send different packets to different queues to distribute processing among CPUs. The NIC distributes packets by typically applying a 4-tuple hash over IP addresses and TCP ports of a packet. In the case of efjat nodes, there are a max of 63 queues even though there are 128 cores.
 
: Contemporary NICs support multiple receive and transmit descriptor queues. On reception, a NIC can send different packets to different queues to distribute processing among CPUs. The NIC distributes packets by typically applying a 4-tuple hash over IP addresses and TCP ports of a packet. In the case of efjat nodes, there are a max of 63 queues even though there are 128 cores.
 +
 +
<blockquote>
 +
<pre>
 +
// See how many queues there are
 +
sudo ethanol -l enp193s0f1np1
 +
</pre>
 +
</blockquote>
  
 
: The indirection table of the NIC, which resolves a specific queue by this hash, is programmed by the driver at initialization. The default mapping is to distribute the queues evenly in the table, but the indirection table can be retrieved and modified at runtime using ethtool commands (-x and -X).
 
: The indirection table of the NIC, which resolves a specific queue by this hash, is programmed by the driver at initialization. The default mapping is to distribute the queues evenly in the table, but the indirection table can be retrieved and modified at runtime using ethtool commands (-x and -X).

Revision as of 15:45, 7 December 2023

PAGE UNDER CONSTRUCTION


Getting Started

XDP stands for eXpress Data Path, and eBPF or BPF stands for extended Berkeley Data Filter
Following are links to a few good places to start learning to program with XDP sockets:
  • The best place to learn to program is the tutorial:
XDP tutorial
  • Helpful sites:
Beginner's Guide to XDP and BPF
Overview of XDP Sockets
RedHat XDP Page

Get and install the XPD/BPF related files

There are 2 main libraries that are needed to use XDP sockets: the libxdp library and libbpf library upon which it depends. Although one can load the 2 from separate packages, that is not recommended as this software is changing so quickly that you'll need versions of the 2 which are compatible. I believe the best option is to use the xdp-tools GitHub repository which has compatible versions of both. The difficulty is that the xdp-tools makefiles are not setup to install libbpf so some custom changes (quite minimal) are needed to be able to do this. For stability's sake I have forked the repo and made all the necessary modifications.


Links

Future advancements/versions in XDP/BPF will mean that this will need to be redone at some point, so here is a note of what was done to make things compile and install:
xdp-tools repository modifications


Following are links to the xdp-tools repos:
Jefferson Lab forked version of xdp-tools (changes to makefiles, etc)
Original xdp-tools repo


Get the GitHub repo

export PREFIX=""
git clone --recurse-submodules https://github.com/JeffersonLab/xdp-tools.git
cd xdp-tools


Host Dependencies

Before this code can be compiled, you must follow the proper setup procedure to address its dependencies.
Setup instructions are at given in the tutorial, XDP tutorial.
Go to the setup_dependencies.org link at Setup Dependencies
However, if you want to avoid wading through that, it boils down to:
// (to get bpftool)
sudo apt install linux-tools-common linux-tools-generic
// to get this to build
sudo apt install linux-tools-5.15.0-87-generic
sudo apt install clang llvm libpcap-dev build-essential
sudo apt install linux-headers-$(uname -r)

// xdp-tools needs emacs
sudo apt install emacs

// you will need to use clang 11 for this to work so install and set commands to this version
sudo apt install clang-11 clang-format-11
sudo update-alternatives --install /usr/bin/clang clang /usr/bin/clang-11 100
sudo update-alternatives --install /usr/bin/clang++ clang++ /usr/bin/clang++-11 100
sudo update-alternatives --install /usr/bin/clang-format clang-format /usr/bin/clang-format-11 100
sudo update-alternatives --install /usr/bin/llc llc /usr/bin/llc-11 100

// check to see if this worked by doing
ls -al /usr/bin/clang*
ls -al /etc/alternatives/clang*
ls -al /usr/bin/llc*
ls -al /etc/alternatives/llc*

// now one can do
./configure
make

// for installation

// make sure there is an ending slash "/" on your install dir !!
export DESTDIR=<install dir>/
export LIBDIR=lib
export HDRDIR=include
export MANDIR=share
export SBINDIR=bin
export SCRIPTSDIR=scripts
make install
The above installation will make and install the xdp-loader program into <install dir>/bin.
It can be used (see below) to both load/unload programs and query what programs have been loaded.

Getting ready to use XDP sockets

  • Each ejfat node has a Mellanox ConnectX-6 Dx NIC which can handle 2x100Gbps or 1x200Gbps.
  • The interface name corresponding to this card is enp193s0f1np1. If yours is different, substitute it.
  • Avoid running XDP code in the skb (generic) mode in which the linux stack is NOT bypassed.
  • Use the XDP native mode in which the linux network stack is bypassed by placing special code in the kernel's NIC driver.
To do this, the NIC's MTU must not be larger than 1 linux page minus some headers.
On the ejfat nodes the max MTU which still allows native mode is 3498.
sudo ifconfig enp193s0f1np1 mtu 3498

NIC queues

Now a note on how recent linux NIC drivers use multiple queues to hold incoming packets (for details see NIC Queues).
Contemporary NICs support multiple receive and transmit descriptor queues. On reception, a NIC can send different packets to different queues to distribute processing among CPUs. The NIC distributes packets by typically applying a 4-tuple hash over IP addresses and TCP ports of a packet. In the case of efjat nodes, there are a max of 63 queues even though there are 128 cores.
// See how many queues there are 
sudo ethanol -l enp193s0f1np1
The indirection table of the NIC, which resolves a specific queue by this hash, is programmed by the driver at initialization. The default mapping is to distribute the queues evenly in the table, but the indirection table can be retrieved and modified at runtime using ethtool commands (-x and -X).
So to see which queue a hash entry maps to by default:
// look at the default mapping of hash keys to queues
sudo ethtool -x enp193s0f1np1
You'll see an even spread of keys over the 63 queues. Now, funnel all the incoming packets into 1 queue (queue #0) so that 1 socket can receive all packets and redo the above command:
// send all UDP IPv4 packets to queue 0
sudo ethtool -N enp193s0f1np1 flow-type udp4 action 0

// look at the new mapping
sudo ethtool -x enp193s0f1np1
This time you'll see that every entry points to queue #0.
Here is an alternative way to put everything onto queue #0:
sudo ethtool -L enp193s0f1np1 combined 1

// Check status of combining queues
sudo ethtool -L enp193s0f1np1

// Undo this with
sudo ethtool -L enp193s0f1np1 combined 63
With multiple data sources, each destined for a separate socket, multiple rules can be setup.
If we have 2 sockets for example, with packets destined for ports 17750 and 18000, then following could be done to send port 17750 traffic to queue 0, and the 18000 traffic to queue 1:
// send port 17750 UDP IPv4 packets to queue 0
sudo ethtool -N enp193s0f1np1 flow-type udp4 dst-port 17750 action 0

// send port 18000 UDP IPv4 packets to queue 1
sudo ethtool -N enp193s0f1np1 flow-type udp4 dst-port 18000 action 1
Here are a couple of commands to administer such rules:
// Show all flow rules
sudo ethtool -n enp193s0f1np1

// Delete rule (rule numbers seen with above command)
sudo ethtool -N enp193s0f1np1 delete <rule #>

Get, make, install, load and run EJFAT-related XDP software

The software in the ejfat-xdp GitHub repo produces 2 programs which must be run at the same time. The first is the special C code which is loaded into the NIC driver and directs IPv4 UDP packets to one of possibly several XDP sockets (xdp_kern.o). The second is the user space program which receives these UDP packets directed to the XDP sockets it creates.
git clone https://github.com/JeffersonLab/ejfat-xdp.git
cd ejfat-xdp
mkdir build
cd build
cmake ..
make install
Loading our special code into the NIC driver can be done in a number of different ways.
The following is just one way of those ways. The code was compiled in the ejfat-xdp repo and stored in
.../ejfat-xdp/build/bin/xdp_kern.o
Just for fun, practice loading it by hand into the NIC driver and checking to see if it succeeded:
// Load the kernel NIC driver code
sudo <xdp_install_dir>/bin/xdp-loader load -m native enp193s0f1np1 xdp_kern.o

// Check the NIC to see if code really loaded and in what mode
sudo /daqfs/xdp/xdp-tools/xdp-loader/xdp-loader status

// Remove everything just loaded
sudo /daqfs/xdp/xdp-tools/xdp-loader/xdp-loader unload enp193s0f1np1 --all
The way most users will do the loading is to run the user code which will do it all for them. It will also unload the code when the user program is killed by control-C:
// Run a user program which loads the special code into the NIC driver and then receives packets:
.../ejfat-xdp/build/bin/xdp_user_mt

// Check the NIC to see if code really loaded and in what mode
sudo /daqfs/xdp/xdp-tools/xdp-loader/xdp-loader status


Run a test

Running your program in zero-copy mode

Look at the following from XDP Overview:

XDP_COPY and XDP_ZEROCOPY bind flags

When you bind to a socket, the kernel will first try to use zero-copy copy. If zero-copy is not supported, it will fall back on using copy mode, i.e. copying all packets out to user space. But if you would like to force a certain mode, you can use the following flags. If you pass the XDP_COPY flag to the bind call, the kernel will force the socket into copy mode. If it cannot use copy mode, the bind call will fail with an error. Conversely, the XDP_ZEROCOPY flag will force the socket into zero-copy mode or fail.


At first try, when using the XDP_ZEROCOPY flag when binding the XDP socket, it appears that for ejfat nodes zero-copy mode does not work! However, investigation reveals some quirks in the Mellanox NIC driver:

Secret to zero-copy with Mellanox NIC driver

In order to make zero-copy work do the following: