Difference between revisions of "EJFAT"

From epsciwiki
Jump to navigation Jump to search
(Created page with " This collaboration between '''E'''Snet and '''J'''Lab for '''F'''PGA '''A'''ccelerated '''T'''ransport (EJFAT) seeks a network data transport capability to aggregate an...")
 
 
(117 intermediate revisions by 4 users not shown)
Line 1: Line 1:
 +
<div class="orbitron"><font size="+3">Welcome to the EJFAT Wiki</font><br></div>('''E'''Snet / '''J'''LaB '''F'''PGA '''A'''ccelerated '''T'''ransport)
  
This collaboration between '''E'''Snet and '''J'''Lab for '''F'''PGA '''A'''ccelerated '''T'''ransport (EJFAT) seeks  a network   data transport  capability to aggregate and dynamically route selected UDP traffic with endpoint feedback.
+
<br><hr><br>
 +
<div class="orbitron"><font size="+1">System Overview:</font></div>''EJFAT is a collaboration between Energy Sciences Network (ESnet) and Thomas Jefferson National Laboratory (JLab) for proof of concept engineering to program a Field Programmable Gate Array (FPGA) for network data routing of commonly tagged UDP packets from any data source to individual and configurable destination endpoints in an end-point compute work load balanced manner, including some additional tagging for stream reassembly at the endpoint. The primary purpose of this FPGA based acceleration is to load balance work to destination compute farm endpoints with low latency and full line rate bandwidth of 100 Gbs with feedback from the destination compute farm.  
  
EJFAT will add meta-data to UDP data streams to be used both by the intervening FPGA, acting as a work Load Balancer (LB), to aggregate data packets from multiple logical input streams and dynamically route to endpoints and for an endpoint Reassembly Engine (RE) to perform custom reassembly resulting from network equipment fragmentation.
+
== EJFAT System Status ==
 +
=== ejfat-1 ===
 +
# 200Gbps NIC: ejfat-1-daq  129.57.177.8
 +
# 10Gbps NIC:  ejfat-1      129.57.177.131
 +
# U280 FPGA:  ejfat-1-dp    129.57.177.11
 +
# Test moving a current LB install wholesale by moving files/containers
 +
# IT Account Setup Sandbox
  
While the aggregation and routing meta-data included as the header in the payload is generic in design, it is being first utilized for streamed (non-triggered) data from the JLab DAQ to the back-end compute farm.
+
=== ejfat-2 ===
 +
# 200Gbps NIC: ejfat-2-daq  129.57.177.2
 +
# 10Gbps NIC: ejfat-2      129.57.177.132
  
 +
=== ejfat-3 ===
 +
# 200Gbps NIC: ejfat-3-daq  129.57.177.3
 +
# 10Gbps NIC:  ejfat-3      129.57.177.133
 +
# Two U280s installed
  
In the initial JLab deployment context, the FPGA will time-stamp aggregate across detector Data Acquisition System (DAQ) channels for the purpose of load balancing work to individual compute farm destinations in a farm status aware manner (see Figure \ref{fig:ejfat} in Appendix \ref{appendix:ejfat}), where ''work''  here is concerned with using data from an individual time-stamp across all DAQ channels to identify or reconstruct detector ''events''.
+
=== ejfat-4 ===
 +
# 200Gbps NIC: ejfat-4-daq  129.57.177.4
 +
# 10Gbps NIC: ejfat-4      129.57.177.134
 +
# XDP experiments
  
This load balancing of computational work is under direct control of the compute farm via dynamic management of routing information communicated to the FPGA host CPU which is passed on to   the FPGA.
+
=== ejfat-5 ===
 +
# 200Gbps NIC: ejfat-5-daq  129.57.177.5
 +
# 10Gbps NIC:  ejfat-5      129.57.177.135
 +
# U280 FPGA:   ejfat-5-dp    129.57.177.15
 +
# LB CP:      ejfat-5      129.57.177.135
 +
# LB: DP latest stable FW, CP latest stable branch
  
As the aggregated/routed data is opaque to this design, it should be reusable for other data streams with  aggregation/routing needs.
+
=== ejfat-6 ===
 +
# 200Gbps NIC: ejfat-6-daq  129.57.177.6
 +
# 10Gbps NIC:  ejfat-6      129.57.177.136
 +
# U280 FPGA:  ejfat-6-dp    129.57.177.16
 +
# LB CP:      ejfat-6      129.57.177.136
 +
# LB: DP latest stable FW, CP latest stable branch
 +
# DAOS experiments
 +
 
 +
=== ejfat-fs ===
 +
# 200Gbps NIC: ejfat-fs-daq  129.57.177.7
 +
# 10Gbps NIC:  ejfat-fs      129.57.177.130
 +
# U280 FPGA:  ejfat-fs-dp    129.57.177.10
 +
# LB CP:      ejfat-fs      129.57.177.130
 +
# LB: DP latest stable FW, CP latest stable branch
 +
# Hosts NVME memory/disk
 +
 
 +
== Presentations/Papers ==
 +
{| class="wikitable"
 +
|-
 +
!date
 +
!presenter
 +
!Event
 +
!links
 +
|-
 +
|2021-03-01
 +
|G. Heyes
 +
|EJFAT Proposal
 +
|[https://jeffersonlab.sharepoint.com/:w:/r/sites/SciComp/_layouts/15/Doc.aspx?sourcedoc=%7B65DA331C-40E4-4761-B643-251BFA309C45%7D&file=20210525%20ASCR%20BRN%20Solicitation%20v4.docx&action=default&mobileredirect=true Word]
 +
|-
 +
|2021-10-21
 +
|M. S. Goodrich
 +
|Div Brief
 +
|[https://jeffersonlab.sharepoint.com/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/EJFAT_for_Div.pdf?CT=1638970015731&OR=ItemsView PDF]
 +
|-
 +
|2021-11-05
 +
|M. S. Goodrich
 +
|Canisius College
 +
|[https://jeffersonlab.sharepoint.com/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/canisius.pdf?CT=1638970328329&OR=ItemsView PDF]
 +
|-
 +
|2021-12-03
 +
|S. Sheldon
 +
|ESnet LB Tutorial
 +
|[https://jeffersonlab.sharepoint.com/:v:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/ESnet_EJFAT_Tut.mp4?csf=1&web=1&e=4nDeZ2 MP4]
 +
|-
 +
|2021-12-10
 +
|Y. Kumar
 +
|SRO iX Presentation
 +
|[https://jeffersonlab.sharepoint.com/:p:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/EJFAT_SRO_iX.pptx?d=w78e41e5ddab04d21a4c26f93ac84b7d6&csf=1&web=1&e=gkaCDS PPTX]
 +
|-
 +
|2022-08-05
 +
|M. S. Goodrich
 +
|RT-2022 Presentation
 +
|[https://jeffersonlab.sharepoint.com/:p:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/JLab%20EJFAT-msg.pptx?d=w7a8e53d19a584fefb1405fa8ff190b1e&csf=1&web=1&e=50bX4g PPTX]
 +
|-
 +
|2022-08-05
 +
|M. S. Goodrich, et al.
 +
|RT-2022 Proceedings
 +
|[https://jeffersonlab.sharepoint.com/:b:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/EJFAT_rt2022.pdf?csf=1&web=1&e=NFHXHM PDF]
 +
|-
 +
|2022-10-20
 +
|S. Sheldon, et al.
 +
|INDIS-2022
 +
|[https://jeffersonlab.sharepoint.com/:b:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/Indis_Paper_2022-3.pdf?csf=1&web=1&e=tmhpfA PDF]
 +
|-
 +
|2022-10-24
 +
|M. S. Goodrich
 +
|ACAT-2022 Presentation
 +
|[https://jeffersonlab.sharepoint.com/:p:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/EJFAT-acat2022.pptx?d=wc024332f3cf7440eae15e4f6f3646897&csf=1&web=1&e=QEwIcx PPTX]
 +
|-
 +
|2023-03-17
 +
|M. S. Goodrich, et al.
 +
|ACAT-2022 Proceedings
 +
|[https://jeffersonlab.sharepoint.com/:b:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/EJFAT_ACAT_2022_QL_sub.pdf?csf=1&web=1&e=dR566P PDF]
 +
|-
 +
|2023-05-11
 +
|M. S. Goodrich, et al.
 +
|CHEP-2023 Presentation
 +
|[https://jeffersonlab.sharepoint.com/:p:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/EJFAT-chep2023.pptx?d=w605623a55051446e9d2bcca80f64eda6&csf=1&web=1&e=NHSloC PPTX]
 +
|-
 +
|2023-10-12
 +
|D. Howard, et al.
 +
|CHEP-2023 Conference Publication
 +
|[https://jeffersonlab.sharepoint.com/:b:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/chep2023_proceedings.pdf?csf=1&web=1&e=FO7f8j PDF]
 +
|-
 +
|2024-03-11
 +
|M. S. Goodrich, et al.
 +
|ACAT-2024 Presentation
 +
|[https://jeffersonlab.sharepoint.com/:p:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/Acat2024.pptx?d=wb4c9cc47a8eb4b299c3dab1aaa379a36&csf=1&web=1&e=Kct82Y} PPTX]
 +
|-
 +
|2024-04-10
 +
|M. S. Goodrich, et al.
 +
|RT-2024 Presentation
 +
|[https://jeffersonlab.sharepoint.com/:p:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/rt2024.pptx?d=w0dba99dbb67f481f9a39907dbec384b8&csf=1&web=1&e=1XISCm} PPTX]
 +
|}
 +
 
 +
== EJFAT Weekly EPSCI Meetings ==
 +
 
 +
[[EJFAT Weekly EPSCI Meetings]]
 +
 
 +
== EJFAT Weekly Collaboration Meetings ==
 +
 
 +
[[EJFAT Weekly Meetings]]
 +
 
 +
== Technical Design Overview ==
 +
 
 +
[[EJFAT Technical Design Overview]]
 +
 
 +
[[UDP Packet Header Formats]]
 +
 
 +
[https://jeffersonlab.sharepoint.com/:p:/r/sites/HPDF/_layouts/15/Doc.aspx?sourcedoc=%7BEABA533A-E516-4C57-BE85-BBF594F5E918%7D&file=Jan%2010%20HPDF%20Conceptual%20Machine%20Design%20Concept.pptx&action=edit&mobileredirect=true IRIAD/EJFAT Testbed]
 +
 
 +
== UDP Transmission Performance ==
 +
 
 +
[[EJFAT UDP General Information]]
 +
 
 +
[[EJFAT UDP General Performance Considerations]]
 +
 
 +
[[EJFAT UDP Packet Receiving and Core Switching]]
 +
 
 +
[[EJFAT UDP Packet Sending and NUMA Nodes]]
 +
 
 +
[[EJFAT UDP Single Thread Packet Sending and Receiving]]
 +
 
 +
[[Testing Load Balancer Bandwidth]]
 +
 
 +
== HOW-TOs ==
 +
 
 +
[[How to setup ejfat nodes]]
 +
 
 +
[[How to install, build and use gRPC]]
 +
 
 +
[[How to install, build and use XDP related packages]]
 +
 
 +
[https://jeffersonlab.sharepoint.com/:b:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/CP_PID_Sched.pdf?csf=1&web=1&e=JpffJ4 How to Compute Schedule Density from PID Signals]
 +
 
 +
[https://jeffersonlab.sharepoint.com/:b:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/E2SAR.drawio.pdf?csf=1&web=1&e=E0Uqlh EJFAT API]
 +
 
 +
== Edge to Core Test Equipment: ==
 +
 
 +
# [https://jeffersonlab.sharepoint.com/:x:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/Edge-to-Core-Test-Stand-12102021.xlsx?d=w8de06c441cd442fd8d3f1b7d7983028d&csf=1&web=1&e=wKS9Lh Price Estimate Spreadsheet]
 +
# [https://jeffersonlab.sharepoint.com/:b:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/EJFAT-Test-Stand-Network-Map.pdf?csf=1&web=1&e=iWvvet Networking Diagram], [[Media:20240209_EJFAT_diagram.pdf | Updated (PDF)]] (from Brent 2024-02-09)
 +
# [https://misportal.jlab.org/reqs/pr/viewPr.do?prNum=408549 PR408549] : Requisition 1 of 2 :
 +
## [https://jeffersonlab.sharepoint.com/:w:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/EJFAT-Test-Stand-Servers-SOW.docx?d=w19107d52332948a0b2924b13939c3f64&csf=1&web=1&e=CS6Ub8 Statement of Work for Servers]
 +
## 1/13/2022: EJFAT team decided to solicit two bid responses, one with MLX NIC and one without. Response from Procurement is "I can ask for the two separate quotes.  If you are going to purchase both option (with & without add-in cards), once I receive the quotes back, you will have submit a new PR to cover the option (without add-in cards)."
 +
## 1/18/2022: Question from KOI Computers: "please clarify what the part number for the NVIDIA Dual Port ConnectX-6". Replied with part # MCX623106AN-CDAT.
 +
## 1/24/2022: Requisition currently open for bid responses from vendors. Due date is COB 1/24/2022.
 +
## 1/27/2022: PO awarded to Atipa for 6 servers and 1 file-server with FPGA and MLX SmartNIC. Expected delivery date from vendor is 5/31/2022.
 +
# [https://misportal.jlab.org/reqs/pr/viewPr.do?prNum=408870 PR408870] [https://misportal.jlab.org/reqs/pr/viewPr.do?prNum=408938 PR408938] Requisition 2 of 2: Statement of Work for Switches & Cables
 +
## 1/14/2022: PRs for the switches, transceivers and fiber have been submitted. I added (4) 2km 100G transceivers to support dual 100G connections between the switches. We can always upgrade to 400G in the future, if needed.
 +
# [https://misportal.jlab.org/reqs/pr/viewPr.do?prNum=409850 PR409850] [https://developer.nvidia.com/arm-hpc-devkit NVIDIA ARM HPC Developer Kit]
 +
## Hardware Specifications for dev kit
 +
##: [[Model]] GIGABYTE G242-P32, 2U server
 +
##: [[CPU]] 1x Ampere Altra Q80-30 (Arm processor)
 +
##: [[Memory]] 512G DDR4 memory
 +
##: [[Storage]] 6TB SAS/ SATA 3.5″
 +
##: [[GPU]] 2x NVIDIA A100 GPU
 +
##: [[Network]] 2x NVIDIA® BlueField®-2 E-Series DPU, 200GbE/HDR single-port QSFP56, PCIe Gen4 x16, secure boot enabled, crypto disabled, 16GB on-board DDR, 1GbE OOB management
 +
 
 +
== Resources ==
 +
* [https://jeffersonlab.sharepoint.com/:b:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/u280_po_Signed_21-M0862%20-%20Avnet.pdf?csf=1&web=1&e=PmJfdu First FPGA PO]
 +
* [https://www.jlab.org TBD]

Latest revision as of 17:51, 29 May 2024

Welcome to the EJFAT Wiki

(ESnet / JLaB FPGA Accelerated Transport)



System Overview:

EJFAT is a collaboration between Energy Sciences Network (ESnet) and Thomas Jefferson National Laboratory (JLab) for proof of concept engineering to program a Field Programmable Gate Array (FPGA) for network data routing of commonly tagged UDP packets from any data source to individual and configurable destination endpoints in an end-point compute work load balanced manner, including some additional tagging for stream reassembly at the endpoint. The primary purpose of this FPGA based acceleration is to load balance work to destination compute farm endpoints with low latency and full line rate bandwidth of 100 Gbs with feedback from the destination compute farm.

EJFAT System Status

ejfat-1

  1. 200Gbps NIC: ejfat-1-daq 129.57.177.8
  2. 10Gbps NIC: ejfat-1 129.57.177.131
  3. U280 FPGA: ejfat-1-dp 129.57.177.11
  4. Test moving a current LB install wholesale by moving files/containers
  5. IT Account Setup Sandbox

ejfat-2

  1. 200Gbps NIC: ejfat-2-daq 129.57.177.2
  2. 10Gbps NIC: ejfat-2 129.57.177.132

ejfat-3

  1. 200Gbps NIC: ejfat-3-daq 129.57.177.3
  2. 10Gbps NIC: ejfat-3 129.57.177.133
  3. Two U280s installed

ejfat-4

  1. 200Gbps NIC: ejfat-4-daq 129.57.177.4
  2. 10Gbps NIC: ejfat-4 129.57.177.134
  3. XDP experiments

ejfat-5

  1. 200Gbps NIC: ejfat-5-daq 129.57.177.5
  2. 10Gbps NIC: ejfat-5 129.57.177.135
  3. U280 FPGA: ejfat-5-dp 129.57.177.15
  4. LB CP: ejfat-5 129.57.177.135
  5. LB: DP latest stable FW, CP latest stable branch

ejfat-6

  1. 200Gbps NIC: ejfat-6-daq 129.57.177.6
  2. 10Gbps NIC: ejfat-6 129.57.177.136
  3. U280 FPGA: ejfat-6-dp 129.57.177.16
  4. LB CP: ejfat-6 129.57.177.136
  5. LB: DP latest stable FW, CP latest stable branch
  6. DAOS experiments

ejfat-fs

  1. 200Gbps NIC: ejfat-fs-daq 129.57.177.7
  2. 10Gbps NIC: ejfat-fs 129.57.177.130
  3. U280 FPGA: ejfat-fs-dp 129.57.177.10
  4. LB CP: ejfat-fs 129.57.177.130
  5. LB: DP latest stable FW, CP latest stable branch
  6. Hosts NVME memory/disk

Presentations/Papers

date presenter Event links
2021-03-01 G. Heyes EJFAT Proposal Word
2021-10-21 M. S. Goodrich Div Brief PDF
2021-11-05 M. S. Goodrich Canisius College PDF
2021-12-03 S. Sheldon ESnet LB Tutorial MP4
2021-12-10 Y. Kumar SRO iX Presentation PPTX
2022-08-05 M. S. Goodrich RT-2022 Presentation PPTX
2022-08-05 M. S. Goodrich, et al. RT-2022 Proceedings PDF
2022-10-20 S. Sheldon, et al. INDIS-2022 PDF
2022-10-24 M. S. Goodrich ACAT-2022 Presentation PPTX
2023-03-17 M. S. Goodrich, et al. ACAT-2022 Proceedings PDF
2023-05-11 M. S. Goodrich, et al. CHEP-2023 Presentation PPTX
2023-10-12 D. Howard, et al. CHEP-2023 Conference Publication PDF
2024-03-11 M. S. Goodrich, et al. ACAT-2024 Presentation PPTX
2024-04-10 M. S. Goodrich, et al. RT-2024 Presentation PPTX

EJFAT Weekly EPSCI Meetings

EJFAT Weekly EPSCI Meetings

EJFAT Weekly Collaboration Meetings

EJFAT Weekly Meetings

Technical Design Overview

EJFAT Technical Design Overview

UDP Packet Header Formats

IRIAD/EJFAT Testbed

UDP Transmission Performance

EJFAT UDP General Information

EJFAT UDP General Performance Considerations

EJFAT UDP Packet Receiving and Core Switching

EJFAT UDP Packet Sending and NUMA Nodes

EJFAT UDP Single Thread Packet Sending and Receiving

Testing Load Balancer Bandwidth

HOW-TOs

How to setup ejfat nodes

How to install, build and use gRPC

How to install, build and use XDP related packages

How to Compute Schedule Density from PID Signals

EJFAT API

Edge to Core Test Equipment:

  1. Price Estimate Spreadsheet
  2. Networking Diagram, Updated (PDF) (from Brent 2024-02-09)
  3. PR408549 : Requisition 1 of 2 :
    1. Statement of Work for Servers
    2. 1/13/2022: EJFAT team decided to solicit two bid responses, one with MLX NIC and one without. Response from Procurement is "I can ask for the two separate quotes. If you are going to purchase both option (with & without add-in cards), once I receive the quotes back, you will have submit a new PR to cover the option (without add-in cards)."
    3. 1/18/2022: Question from KOI Computers: "please clarify what the part number for the NVIDIA Dual Port ConnectX-6". Replied with part # MCX623106AN-CDAT.
    4. 1/24/2022: Requisition currently open for bid responses from vendors. Due date is COB 1/24/2022.
    5. 1/27/2022: PO awarded to Atipa for 6 servers and 1 file-server with FPGA and MLX SmartNIC. Expected delivery date from vendor is 5/31/2022.
  4. PR408870 PR408938 Requisition 2 of 2: Statement of Work for Switches & Cables
    1. 1/14/2022: PRs for the switches, transceivers and fiber have been submitted. I added (4) 2km 100G transceivers to support dual 100G connections between the switches. We can always upgrade to 400G in the future, if needed.
  5. PR409850 NVIDIA ARM HPC Developer Kit
    1. Hardware Specifications for dev kit
      Model GIGABYTE G242-P32, 2U server
      CPU 1x Ampere Altra Q80-30 (Arm processor)
      Memory 512G DDR4 memory
      Storage 6TB SAS/ SATA 3.5″
      GPU 2x NVIDIA A100 GPU
      Network 2x NVIDIA® BlueField®-2 E-Series DPU, 200GbE/HDR single-port QSFP56, PCIe Gen4 x16, secure boot enabled, crypto disabled, 16GB on-board DDR, 1GbE OOB management

Resources