Difference between revisions of "Install an EJFAT Load Balancer"

From epsciwiki
Jump to navigation Jump to search
 
(12 intermediate revisions by the same user not shown)
Line 57: Line 57:
  
 
git checkout 977a5678
 
git checkout 977a5678
</pre>
 
 
Remark out the following lines in Dockerfile:
 
<pre>
 
# Download and extract a few versions of the Satellite Controller firmware packages
 
#  https://www.xilinx.com/support/download/index.html/content/xilinx/en/downloadNav/alveo.html
 
# ARG SC_FW_BASE_URL="https://www.xilinx.com/bin/public/openDownload?filename="
 
# ARG SC_FW_U280_PKGS="xilinx-u280-gen3x16-xdma_2023.1_2023_0507_2220-all.deb.tar.gz xilinx-u280-gen3x16-xdma_2022.1_2022_0804_1110-all.deb.tar.gz"
 
# ARG SC_FW_U55C_PKGS="xilinx-u55c-gen3x16-xdma_2023.1_2023_0507_2220-all.deb.tar.gz xilinx-u55c-gen3x16-xdma_2022.1_2022_0415_2123-all.deb.tar.gz"
 
# RUN \
 
#  cd /sc-fw-downloads && \
 
#  for f in $SC_FW_U280_PKGS $SC_FW_U55C_PKGS ; do \
 
#    echo "Fetching: $SC_FW_BASE_URL$f" ; \
 
#    wget -qO- "$SC_FW_BASE_URL$f" | tar xz --wildcards 'xilinx-sc-fw*.deb' ; \
 
#  done ; \
 
#  mkdir -p /sc-fw && \
 
#  for sc in /sc-fw-downloads/xilinx-sc-fw*.deb ; do \
 
#    dpkg-deb --fsys-tarfile "$sc" | tar x -C /sc-fw --strip-components 6 --wildcards './opt/xilinx/firmware/sc-fw/*/sc-fw-*.txt' ; \
 
#  done
 
 
</pre>
 
</pre>
  
Line 168: Line 149:
 
COMPOSE_PROFILES=smartnic-mgr-vfio-unlock  
 
COMPOSE_PROFILES=smartnic-mgr-vfio-unlock  
 
</pre>
 
</pre>
 +
 +
Note that this causes use of the host IOMMU which supports several different functions in the LB stack.
  
 
Un-remark and set the JTag serial code:  
 
Un-remark and set the JTag serial code:  
Line 173: Line 156:
 
Execute the bash cmd:  
 
Execute the bash cmd:  
 
<pre>
 
<pre>
SER=`sudo lsusb -v -d 0403:6011 | grep iSerial|tr -s ' '|cut -f4 -d' '`
+
sudo lsusb -v -d 0403:6011 | grep iSerial|tr -s ' '|cut -f4 -d' '
 
</pre>
 
</pre>
  
Line 179: Line 162:
  
 
<pre>
 
<pre>
HW_TARGET_SERIAL=${SER}A   #Note the appended 'A' char  
+
HW_TARGET_SERIAL=21770323600GA   #Note the appended 'A' char  
 
</pre>
 
</pre>
  
==== Un-remark and set the FPGA PCI device code: ====
 
  
 
Execute the bash cmd:
 
Execute the bash cmd:
 
<pre>
 
<pre>
FPDC=`lspci -Dd 10ee:|head -1|tr -s ' '|cut -f1 -d' '|cut -f1 -d'.'`
+
lspci -Dd 10ee:|head -1|tr -s ' '|cut -f1 -d' '|cut -f1 -d'.'
 
</pre>
 
</pre>
  
 
e.g., 0000:a1:00  
 
e.g., 0000:a1:00  
 +
 +
Un-remark and set the FPGA PCI device code:
  
 
<pre>
 
<pre>
FPGA_PCIE_DEV=$FPDC
+
FPGA_PCIE_DEV=0000:a1:00
 
</pre>
 
</pre>
  
 +
Execute the bash cmd:
 +
 +
hostname
 +
 +
e.g., ejfat-2
  
 
Un-remark and set the following lines:  
 
Un-remark and set the following lines:  
  
 
<pre>
 
<pre>
SN_HOST=${HOSTNAME}-dp.jlab.org          #Note this is the data planes (FPGA) well known IPV4 address or network name  
+
SN_HOST=ejfat-2-dp.jlab.org          #Note this is the data planes (FPGA) well known IPV4 address or network name  
 
</pre>
 
</pre>
  
Line 206: Line 195:
 
Execute the bash cmd:  
 
Execute the bash cmd:  
  
 +
openssl rand -base64 24
 +
 +
e.g., 1CEpuDN0z39AFndEvcP3EmsuT8zu+3lt
 
<pre>
 
<pre>
AUTH=`openssl rand -base64 24`
+
SN_CFG_AUTH_TOKEN=1CEpuDN0z39AFndEvcP3EmsuT8zu+3lt
 
</pre>
 
</pre>
  
e.g., 1CEpuDN0z39AFndEvcP3EmsuT8zu+3lt
+
Un-remark and set the P4 AUTH token:
 +
 
 +
Execute the bash cmd:
 +
 
 +
openssl rand -base64 24
 +
 
 +
e.g., N8C1q/6b2iXSBax30lFoSK0c77BLt5P3
 
<pre>
 
<pre>
SN_CFG_AUTH_TOKEN=$AUTH
+
SN_P4_AUTH_TOKEN=N8C1q/6b2iXSBax30lFoSK0c77BLt5P3
 
</pre>
 
</pre>
  
 
==== Modify the sn-stack/docker-compose.yml file: ====
 
==== Modify the sn-stack/docker-compose.yml file: ====
  
In the smartnic-hw/command: section, uncomment the FORCE argument line in the /scripts/program_card.sh invocation:
+
In the smartnic-hw/command: section (line #249), uncomment the FORCE argument line in the /scripts/program_card.sh invocation:
  
 
<pre>
 
<pre>
Line 251: Line 249:
 
</pre>  
 
</pre>  
  
add the following lines to the end of the smartnic-p4:  section:  
+
add the following lines to the end of the smartnic-p4 (line #560):  section:  
  
 
<pre>  
 
<pre>  
Line 344: Line 342:
 
=== Modifiy docker-compose.yml ===
 
=== Modifiy docker-compose.yml ===
  
Mount host filespace for /data
+
Mount host filespace for /data (line #10):
 +
 
 
<pre>
 
<pre>
 
services:
 
services:
 
   volumes:
 
   volumes:
 +
    - ./etc:/etc/udplbd.
 
     - ./data:/data
 
     - ./data:/data
 
</pre>
 
</pre>
  
Mount host TLS cert location for /certs
+
Mount host TLS cert location for /certs (line #10):
 +
 
 
<pre>
 
<pre>
 
services:
 
services:
 
   volumes:
 
   volumes:
 +
    - ./etc:/etc/udplbd.
 +
    - ./data:/data
 
     - /etc/letsencrypt/archive/<machine>.jlab.org:/certs
 
     - /etc/letsencrypt/archive/<machine>.jlab.org:/certs
 
</pre>
 
</pre>
Line 363: Line 366:
 
</pre>
 
</pre>
  
Follow instructions in README.md  
+
Follow instructions in README.md
  
 
=== Modifiy /etc/config.yml ===
 
=== Modifiy /etc/config.yml ===
<li> specify FPGA DP IPV4/6 addresses (up to 8)
+
<li> specify FPGA DP IPV4/6 addresses (up to 8) int the ipv4: and ipv6: sections
 
<li> specify FPGA DP MAC unicast/broadcast addresses
 
<li> specify FPGA DP MAC unicast/broadcast addresses
 +
<li> set IP numbers for CP host
 
<li> Put host IPV4 for CP event numbers/host (sync)
 
<li> Put host IPV4 for CP event numbers/host (sync)
<li> Specify an event number/port for each address in 7.3a.a
+
<li> Specify an event number/port for each DP address above
<li> Put host IPV4 for CP server/host (grpc)
+
<li> Put CP host IPV4 for CP server/host (grpc) listen address
 
<li> Specify an auth token for CP grpc comms
 
<li> Specify an auth token for CP grpc comms
 
<li> optionally enable server/TLS
 
<li> optionally enable server/TLS
 
<li> optionally specify container path to server/tls/certFile and server/tls/keyFile
 
<li> optionally specify container path to server/tls/certFile and server/tls/keyFile
<li> optionally perform steps 7.3a.g-h for smartnic/tls
+
<li> optionally set CP logging info level
 +
<li> disable smartnic mock mode
 +
<li> set smartnic host as "localhost"
 +
<li> set smartnic port to match smartnic-fw setup above
 +
<li> set smartnic auth token to match that configured above for smartnic P4
 +
<li> enable smartnic tls and set verify to false
 +
<li> add a top-level section for prometheus e.g.,
 +
 
 +
<pre>
 +
prometheus:
 +
  enable: true
 +
  listen: 127.0.0.1:2116
 +
</pre>
  
 +
=== Build the CP container ===
 
<pre>
 
<pre>
 
docker compose build
 
docker compose build
 +
</pre>
 +
=== Launch the CP container ===
 +
<pre>
 
docker compose up -d
 
docker compose up -d
 +
</pre>
 +
 +
=== Verify CP is correctly operating ===
 +
<pre>
 
docker compose -f ~/esnet/udplbd/docker-compose.yml  logs udplbd | less
 
docker compose -f ~/esnet/udplbd/docker-compose.yml  logs udplbd | less
 
</pre>
 
</pre>
Line 384: Line 408:
 
== Execute the the FPGA cmac setup procedure ==
 
== Execute the the FPGA cmac setup procedure ==
 
<pre>
 
<pre>
cp /daqfs/efat/Downloads/esnet/u280_cmac_setup.sh ~/esnet/esnet-smartnic-fw/sn-stack/scratch
+
cp /daqfs/ejfat/Downloads/esnet/u280_cmac_setup.sh ~/esnet/esnet-smartnic-fw/sn-stack/scratch
  
 
chmod +x ~/esnet/esnet-smartnic-fw/sn-stack/scratch/u280_cmac_setup.sh
 
chmod +x ~/esnet/esnet-smartnic-fw/sn-stack/scratch/u280_cmac_setup.sh

Latest revision as of 19:05, 19 December 2024

New Installation Preparations

Check for stale docker images:

docker image ls

Delete images with tags: esnet-smartnic-fw, smartnic-dpdk-docker, xilinx-labtools-docker, udplbd

Initial setup:

mkdir ~/esnet 

cd ~/esnet 

git clone --recursive https://github.com/esnet/xilinx-labtools-docker 

git clone --recursive https://github.com/esnet/smartnic-dpdk-docker 

git clone --recursive https://github.com/esnet/esnet-smartnic-fw 

git clone https://github.com/JeffersonLab/ersap-grpc.git

git clone https://github.com/esnet/udplbd.git 

Proper revisions:

Stable Lineup
Purpose Version Container Revision
HW 57684 udplb c6956b46
FW 58131 esnet-smartnic-fw a07943f0
SW 0.3.2 udplbd 5712d10
Lab 57755 xilinx-labtools-docker 977a5678
DPDK 57593 smartnic-dpdk-docker fd1ea53

Xilinx Supports tools:

Required Binaries

cp /daqfs/ejfat/Downloads/xilinx/Vivado_Lab_Lin_2023.2_1013_2256.tar.gz  ~/esnet/xilinx-labtools-docker/vivado-installer/ 
cp /daqfs/ejfat/Downloads/xilinx/loadsc_v2.3.zip                         ~/esnet/xilinx-labtools-docker/sc-fw-downloads 
cp /daqfs/ejfat/Downloads/esnet/SC_U280_4_3_31.zip                       ~/esnet/xilinx-labtools-docker/sc-fw-downloads

Docker build for Xilinx Labtools:

cd ~/esnet/xilinx-labtools-docker

git checkout 977a5678

Follow instructions in README.md

Docker build for DPDK:

cd ~/esnet/smartnic-dpdk-docker 

git checkout fd1ea53

Follow instructions in README.md

Docker build for smartnic:

cd ~/esnet/esnet-smartnic-fw 

git checkout a07943f0

The ejfat f/w is engineered and obtained from ESnet as an artifacts file:

SN_HW_VER=57684
SN_HW_APP_NAME=udplb

cp /daqfs/ejfat/Downloads/esnet/artifacts.au280.$SN_HW_APP_NAME.$SN_HW_VER.zip ~/esnet/esnet-smartnic-fw/sn-hw 

Follow instructions in README.md up to and including (if necessary) the following lines:

mkdir -p ~/.docker/cli-plugins/
curl -SL https://github.com/docker/compose/releases/download/v2.27.1/docker-compose-linux-x86_64 -o ~/.docker/cli-plugins/docker-compose
chmod +x ~/.docker/cli-plugins/docker-compose 

update cloned repo:

git submodule init 

git submodule update 


Modifiy the .env file:

cp example.env .env 

The following .env var lines must be populated:

Note that Docker images can be retrieved from a remote es.net repository or retrieval will instead be made from a local Docker repository.

SMARTNIC_DPDK_IMAGE_URI=<REPOSITORY:TAG> 

Similarly,

LABTOOLS_IMAGE_URI=<REPOSITORY:TAG> 

Un-remark and set the following lines:

SN_HW_APP_NAME=udplb 

SN_HW_BOARD=au280 

SN_HW_VER=57684    

SN_FW_VER=44124       #Note this value is useful but not critical; can be set to zero 


5.1 Build the firmware:

./build.sh 

Modify the sn-stack/.env file:

Un-remark and set the following lines:

COMPOSE_PROFILES=smartnic-mgr-vfio-unlock 

Note that this causes use of the host IOMMU which supports several different functions in the LB stack.

Un-remark and set the JTag serial code:

Execute the bash cmd:

sudo lsusb -v -d 0403:6011 | grep iSerial|tr -s ' '|cut -f4 -d' '

e.g., 21770323600G

HW_TARGET_SERIAL=21770323600GA   #Note the appended 'A' char 


Execute the bash cmd:

lspci -Dd 10ee:|head -1|tr -s ' '|cut -f1 -d' '|cut -f1 -d'.'

e.g., 0000:a1:00

Un-remark and set the FPGA PCI device code:

FPGA_PCIE_DEV=0000:a1:00 

Execute the bash cmd:

hostname

e.g., ejfat-2

Un-remark and set the following lines:

SN_HOST=ejfat-2-dp.jlab.org          #Note this is the data planes (FPGA) well known IPV4 address or network name 

Un-remark and set the rpc AUTH token:

Execute the bash cmd:

openssl rand -base64 24

e.g., 1CEpuDN0z39AFndEvcP3EmsuT8zu+3lt

SN_CFG_AUTH_TOKEN=1CEpuDN0z39AFndEvcP3EmsuT8zu+3lt 

Un-remark and set the P4 AUTH token:

Execute the bash cmd:

openssl rand -base64 24

e.g., N8C1q/6b2iXSBax30lFoSK0c77BLt5P3

SN_P4_AUTH_TOKEN=N8C1q/6b2iXSBax30lFoSK0c77BLt5P3 

Modify the sn-stack/docker-compose.yml file:

In the smartnic-hw/command: section (line #249), uncomment the FORCE argument line in the /scripts/program_card.sh invocation:

    command:
      - /bin/bash
      - -c
      - -e
      - -o
      - pipefail
      - -x
      - |
        if [ ! -e /bitfiles/ok ] ; then
          exit 1
        fi
        /scripts/program_card.sh \
          xilinx-hwserver:3121 \
          "${HW_TARGET_SERIAL:-*}" \
          /bitfiles/esnet-smartnic.bit \
          $FPGA_PCIE_DEV \
          FORCE                           #### <- ###################
        if [ $$? ] ; then
          touch /status/ok
          sleep infinity
        fi

In older configurations it is required to expose TCP port 50051 (smartnic-p4) outside of the *firmware* docker stack so that the external control plane can reach the p4 agent. This is needed for retro-fitting older firmware with the newer FW / control-plane split. Newer firmware doesn't need this port fixup.

Exposing the p4 agent TCP port is done by adding this stanza to the "smartnic-p4" section:

 
    ports: 
      - "50051:50051" 

add the following lines to the end of the smartnic-p4 (line #560): section:

 

    logging: 
      options: 
        max-file: 5 
        max-size: 100m

Verify the sn-stack/docker-compose.yml:

cd sn-stack 

docker compose config --quiet && echo "All good!" 

If applicable, follow instructions in esnet-smartnic-fw/sn-stack/README.INSTALL.md for: One-Time setup:

Converting from factory flash image to ESnet Smartnic flash image

Perform a cold-boot (power cycle) of the server hosting the FPGA card

It is essential that this is a proper power cycle and not simply a warm reboot. Specifically do not use

 
shutdown -r now  

Instead

 
shutdown -P  

then (Remotely): (smokenmirrors)

ipmitool -I lanplus -U ejfat -L Operator -H $HOSTNAME-bmc.jlab.org chassis power status
ipmitool -I lanplus -U ejfat -L Operator -H $HOSTNAME-bmc.jlab.org chassis power on

Failure to perform a cold-boot here will result in an unusable card.


Normal Operation of the Runtime Environment:

docker compose up -d

Verify that

 
docker compose -f ~/esnet/esnet-smartnic-fw/sn-stack/docker-compose.yml  exec smartnic-fw sn-cli dev version 

Returns something like:

Device Version Info

        DNA:           0x40020000013b83c12c108485 
        USR_ACCESS:    0x0000ac1b (57684) 
        BUILD_STATUS:  0x12211043 
docker compose -f ~/esnet/esnet-smartnic-fw/sn-stack/docker-compose.yml  logs smartnic-fw

Returns something like:

smartnic-fw-1 | + sleep infinity

Library build for ersap-grpc

cd ~/esnet/ersap-grpc/

git switch  esnet3
git checkout a3b85c3868554380e12759f23335eaf3fead2441

export GRPC_INSTALL_DIR=/daqfs/ersap/installation3

Follow instructions in README.md

Note: It is typically not necessary to install/build grpc as the line above indicates

Docker build for Control Plane:

cd ~/esnet/udplbd/

git checkout 5712d10

cp /daqfs/ejfat/Downloads/JLab/JLabCA.crt ~/esnet/udplbd/

Modifiy docker-compose.yml

Mount host filespace for /data (line #10):

services:
  volumes:
    - ./etc:/etc/udplbd.
    - ./data:/data

Mount host TLS cert location for /certs (line #10):

services:
  volumes:
    - ./etc:/etc/udplbd.
    - ./data:/data
    - /etc/letsencrypt/archive/<machine>.jlab.org:/certs

remove the leftover udplbd data base file:

rm ~/esnet/udplbd/data/udplbd.db

Follow instructions in README.md

Modifiy /etc/config.yml

  • specify FPGA DP IPV4/6 addresses (up to 8) int the ipv4: and ipv6: sections
  • specify FPGA DP MAC unicast/broadcast addresses
  • set IP numbers for CP host
  • Put host IPV4 for CP event numbers/host (sync)
  • Specify an event number/port for each DP address above
  • Put CP host IPV4 for CP server/host (grpc) listen address
  • Specify an auth token for CP grpc comms
  • optionally enable server/TLS
  • optionally specify container path to server/tls/certFile and server/tls/keyFile
  • optionally set CP logging info level
  • disable smartnic mock mode
  • set smartnic host as "localhost"
  • set smartnic port to match smartnic-fw setup above
  • set smartnic auth token to match that configured above for smartnic P4
  • enable smartnic tls and set verify to false
  • add a top-level section for prometheus e.g.,
    prometheus:
      enable: true
      listen: 127.0.0.1:2116
    

    Build the CP container

    docker compose build
    

    Launch the CP container

    docker compose up -d
    

    Verify CP is correctly operating

    docker compose -f ~/esnet/udplbd/docker-compose.yml  logs udplbd | less
    

    Execute the the FPGA cmac setup procedure

    cp /daqfs/ejfat/Downloads/esnet/u280_cmac_setup.sh ~/esnet/esnet-smartnic-fw/sn-stack/scratch
    
    chmod +x ~/esnet/esnet-smartnic-fw/sn-stack/scratch/u280_cmac_setup.sh
    
    docker compose -f ~/esnet/esnet-smartnic-fw/sn-stack/docker-compose.yml  exec smartnic-fw /scratch/u280_cmac_setup.sh > ~/esnet/esnet-smartnic-fw/sn-stack/scratch/u280_cmac_setup.out