Difference between revisions of "EJFAT"
m (→ejfat-4) |
|||
(186 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
+ | <div class="orbitron"><font size="+3">Welcome to the EJFAT Wiki</font><br></div>('''E'''Snet / '''J'''LaB '''F'''PGA '''A'''ccelerated '''T'''ransport) | ||
+ | |||
+ | <br><hr><br> | ||
+ | <div class="orbitron"><font size="+1">System Overview:</font></div>''EJFAT is a collaboration between Energy Sciences Network (ESnet) and Thomas Jefferson National Laboratory (JLab) for proof of concept engineering for accelerated load balancer (LB) using dynamic IP4/6 address forwarding. Dynamic because the forwarding address is chosen dynamically from a collection of destination endpoints based on near real-time destination workload conditions, and accelerated because the forwarding is accomplished with low fixed latency at line rates of up to 200Gbps per FPGA, where in general a functioning LB may consist of up to four FPGAs acting as one logical DP for a total bandwidth capacity of over 1 Tbps. The low, fixed latency is achieved by utilization of an appropriately programmed Field Programmable Gate Array (FPGA) to effect the Data Plane (DP) functions of the LB. | ||
+ | |||
+ | == EJFAT System Status == | ||
+ | === ejfat-1 === | ||
+ | # 100Gbps NIC: ejfat-1-daq 129.57.177.8 | ||
+ | # 10Gbps NIC: ejfat-1 129.57.177.131 | ||
+ | # U280 FPGA: ejfat-1-dp 129.57.177.{9-16} - '''LAG'd for 200Gbps''' | ||
+ | # LB CP: ejfat-1 129.57.177.131, latest Stable branch | ||
+ | # LB: DP latest Stable FW | ||
+ | # CP Web UI port 8081 | ||
+ | |||
+ | === ejfat-2 === | ||
+ | # 100Gbps NIC: ejfat-2-daq 129.57.177.2 | ||
+ | # 10Gbps NIC: ejfat-2 129.57.177.132 | ||
+ | # 100Gbps U280 FPGA: ejfat-2-dp 129.57.177.{17-24} | ||
+ | # LB CP: ejfat-2 129.57.177.132, latest Stable branch | ||
+ | # LB: DP latest Stable FW | ||
+ | # CP Web UI port 8082 | ||
+ | |||
+ | === ejfat-3 === | ||
+ | # 200Gbps NIC: ejfat-3-daq 129.57.177.3 | ||
+ | # 10Gbps NIC: ejfat-3 129.57.177.133 | ||
+ | # '''Two U280s installed - LAG'd for 400Gbps''' | ||
+ | # FW Containers built by Stacey | ||
+ | |||
+ | === ejfat-4 === | ||
+ | # 100Gbps NIC: ejfat-4-daq 129.57.177.4 | ||
+ | # 10Gbps NIC: ejfat-4 129.57.177.134 | ||
+ | # '''XDP experiments''' | ||
+ | # 100Gbps U280 FPGA: ejfat-4-dp 129.57.177.{41-48} | ||
+ | # LB CP: ejfat-4 129.57.177.134, <s>latest Stable branch</s> | ||
+ | # LB: DP <s>latest Stable FW</s> | ||
+ | |||
+ | === ejfat-5 === | ||
+ | # 200Gbps NIC: ejfat-5-daq 129.57.177.5 | ||
+ | # 10Gbps NIC: ejfat-5 129.57.177.135 | ||
+ | # LB CP: ejfat-5 129.57.177.135, <s>latest Stable branch</s> | ||
+ | # 100Gbps U280 FPGA: ejfat-5-dp 129.57.177.{49-56} | ||
+ | # LB: DP <s>latest Stable FW</s> | ||
+ | # '''Optical Taps Installed''' | ||
+ | |||
+ | === ejfat-6 === | ||
+ | # 200Gbps NIC: ejfat-6-daq 129.57.177.6 | ||
+ | # 10Gbps NIC: ejfat-6 129.57.177.136 | ||
+ | # DAOS experiments | ||
+ | # '''Using Ubuntu 24.04 LTS''' | ||
+ | # FW containers built | ||
+ | # Waiting for podman compose installation | ||
+ | |||
+ | === ejfat-fs === | ||
+ | # 100Gbps NIC: ejfat-fs-daq 129.57.177.7 | ||
+ | # 10Gbps NIC: ejfat-fs 129.57.177.130 | ||
+ | # Hosts NVME memory/disk | ||
+ | # 100Gbps U280 FPGA: ejfat-fs-dp 129.57.177.{65-72} | ||
+ | # LB CP: ejfat-fs 129.57.177.130, latest Stable branch | ||
+ | # LB: DP latest Stable FW | ||
+ | # CP Web UI port 8080 | ||
+ | |||
== Presentations/Papers == | == Presentations/Papers == | ||
{| class="wikitable" | {| class="wikitable" | ||
Line 20: | Line 81: | ||
|M. S. Goodrich | |M. S. Goodrich | ||
|Canisius College | |Canisius College | ||
− | |[[ | + | |[https://jeffersonlab.sharepoint.com/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/canisius.pdf?CT=1638970328329&OR=ItemsView PDF] |
+ | |- | ||
+ | |2021-12-03 | ||
+ | |S. Sheldon | ||
+ | |ESnet LB Tutorial | ||
+ | |[https://jeffersonlab.sharepoint.com/:v:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/ESnet_EJFAT_Tut.mp4?csf=1&web=1&e=4nDeZ2 MP4] | ||
+ | |- | ||
+ | |2021-12-10 | ||
+ | |Y. Kumar | ||
+ | |SRO iX Presentation | ||
+ | |[https://jeffersonlab.sharepoint.com/:p:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/EJFAT_SRO_iX.pptx?d=w78e41e5ddab04d21a4c26f93ac84b7d6&csf=1&web=1&e=gkaCDS PPTX] | ||
+ | |- | ||
+ | |2022-08-05 | ||
+ | |M. S. Goodrich | ||
+ | |RT-2022 Presentation | ||
+ | |[https://jeffersonlab.sharepoint.com/:p:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/JLab%20EJFAT-msg.pptx?d=w7a8e53d19a584fefb1405fa8ff190b1e&csf=1&web=1&e=50bX4g PPTX] | ||
+ | |- | ||
+ | |2022-08-05 | ||
+ | |M. S. Goodrich, et al. | ||
+ | |RT-2022 Proceedings | ||
+ | |[https://jeffersonlab.sharepoint.com/:b:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/EJFAT_rt2022.pdf?csf=1&web=1&e=NFHXHM PDF] | ||
+ | |- | ||
+ | |2022-10-20 | ||
+ | |S. Sheldon, et al. | ||
+ | |INDIS-2022 | ||
+ | |[https://jeffersonlab.sharepoint.com/:b:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/Indis_Paper_2022-3.pdf?csf=1&web=1&e=tmhpfA PDF] | ||
+ | |- | ||
+ | |2022-10-24 | ||
+ | |M. S. Goodrich | ||
+ | |ACAT-2022 Presentation | ||
+ | |[https://jeffersonlab.sharepoint.com/:p:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/EJFAT-acat2022.pptx?d=wc024332f3cf7440eae15e4f6f3646897&csf=1&web=1&e=QEwIcx PPTX] | ||
+ | |- | ||
+ | |2023-03-17 | ||
+ | |M. S. Goodrich, et al. | ||
+ | |ACAT-2022 Proceedings | ||
+ | |[https://jeffersonlab.sharepoint.com/:b:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/EJFAT_ACAT_2022_QL_sub.pdf?csf=1&web=1&e=dR566P PDF] | ||
+ | |- | ||
+ | |2023-05-11 | ||
+ | |M. S. Goodrich, et al. | ||
+ | |CHEP-2023 Presentation | ||
+ | |[https://jeffersonlab.sharepoint.com/:p:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/EJFAT-chep2023.pptx?d=w605623a55051446e9d2bcca80f64eda6&csf=1&web=1&e=NHSloC PPTX] | ||
+ | |- | ||
+ | |2023-10-12 | ||
+ | |D. Howard, et al. | ||
+ | |CHEP-2023 Conference Publication | ||
+ | |[https://jeffersonlab.sharepoint.com/:b:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/chep2023_proceedings.pdf?csf=1&web=1&e=FO7f8j PDF] | ||
+ | |- | ||
+ | |2024-03-11 | ||
+ | |M. S. Goodrich, et al. | ||
+ | |ACAT-2024 Presentation | ||
+ | |[https://jeffersonlab.sharepoint.com/:p:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/Acat2024.pptx?d=wb4c9cc47a8eb4b299c3dab1aaa379a36&csf=1&web=1&e=Kct82Y} PPTX] | ||
+ | |- | ||
+ | |2024-04-10 | ||
+ | |M. S. Goodrich, et al. | ||
+ | |RT-2024 Presentation | ||
+ | |[https://jeffersonlab.sharepoint.com/:p:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/rt2024.pptx?d=w0dba99dbb67f481f9a39907dbec384b8&csf=1&web=1&e=1XISCm} PPTX] | ||
+ | |- | ||
+ | |2024-07-31 | ||
+ | |M. S. Goodrich, et al. | ||
+ | |ACAT-2024 Proceedings | ||
+ | |[https://jeffersonlab.sharepoint.com/:b:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/ACAT_2024.pdf?csf=1&web=1&e=HkQedP PDF] | ||
+ | |- | ||
+ | |2024-10-02 | ||
+ | |S. Veseli, APS/SDM | ||
+ | |APS/ALS - EJFAT | ||
+ | |[https://jeffersonlab.sharepoint.com/:p:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/AlsEjfatMeeting-20241002.pptx?d=wcaa3a21ffd3a466f979bf3f5fbaab457&csf=1&web=1&e=BSOlI7 PPTX] | ||
|} | |} | ||
+ | |||
+ | == EJFAT Weekly EPSCI Meetings == | ||
+ | |||
+ | [[EJFAT Weekly EPSCI Meetings]] | ||
+ | |||
+ | == EJFAT Weekly Collaboration Meetings == | ||
+ | |||
+ | [[EJFAT Weekly Meetings]] | ||
== Technical Design Overview == | == Technical Design Overview == | ||
[[EJFAT Technical Design Overview]] | [[EJFAT Technical Design Overview]] | ||
+ | |||
+ | [[UDP Packet Header Formats]] | ||
+ | |||
+ | [https://jeffersonlab.sharepoint.com/:p:/r/sites/HPDF/_layouts/15/Doc.aspx?sourcedoc=%7BEABA533A-E516-4C57-BE85-BBF594F5E918%7D&file=Jan%2010%20HPDF%20Conceptual%20Machine%20Design%20Concept.pptx&action=edit&mobileredirect=true IRIAD/EJFAT Testbed] | ||
+ | |||
+ | == UDP Transmission Performance == | ||
+ | |||
+ | [[EJFAT UDP General Information]] | ||
+ | |||
+ | [[EJFAT UDP General Performance Considerations]] | ||
+ | |||
+ | [[EJFAT UDP Packet Receiving and Core Switching]] | ||
+ | |||
+ | [[EJFAT UDP Packet Sending and NUMA Nodes]] | ||
+ | |||
+ | [[EJFAT UDP Single Thread Packet Sending and Receiving]] | ||
+ | |||
+ | [[Testing Load Balancer Bandwidth]] | ||
+ | |||
+ | == HOW-TOs == | ||
+ | |||
+ | [[How to use Control Plane Web UI]] | ||
+ | |||
+ | [[How to Monitor Prometheus]] | ||
+ | |||
+ | [https://wiki.jlab.org/epsciwiki/index.php/Install_an_EJFAT_Load_Balancer Install a Load Balancer] | ||
+ | |||
+ | [https://jeffersonlab.sharepoint.com/:t:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/lbtest.txt?csf=1&web=1&e=PNz0DM Test a Load Balancer] | ||
+ | |||
+ | [[How to setup ejfat nodes]] | ||
+ | |||
+ | [[How to install, build and use gRPC]] | ||
+ | |||
+ | [[How to install, build and use XDP related packages]] | ||
+ | |||
+ | [https://jeffersonlab.sharepoint.com/:b:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/CP_PID_Sched.pdf?csf=1&web=1&e=JpffJ4 How to Compute Schedule Density from PID Signals] | ||
+ | |||
+ | [https://linuxconfig.org/how-to-enable-jumbo-frames-in-linux Enable Jumbo Frames] | ||
+ | |||
+ | Network Path MTU Discovery support in the Linux Kernel: | ||
+ | |||
+ | <pre> | ||
+ | file: /proc/sys/net/ipv4/tcp_mtu_probing | ||
+ | variable: net.ipv4.tcp_mtu_probing (integer; default: 0; since Linux 2.6.17): | ||
+ | |||
+ | tcp_mtu_probing - INTEGER | ||
+ | Controls TCP Packetization-Layer Path MTU Discovery. Takes three values: | ||
+ | 0 - Disabled | ||
+ | 1 - Disabled by default, enabled when an ICMP black hole detected | ||
+ | 2 - Always enabled, use initial MSS of tcp_base_mss. | ||
+ | </pre> | ||
+ | |||
+ | == REFERENCEs == | ||
+ | |||
+ | [https://jeffersonlab.sharepoint.com/:x:/r/sites/DataCenter/_layouts/15/Doc.aspx?sourcedoc=%7B3F832940-1BA2-4183-A00A-5085C5A353D6%7D&file=IRIAD-testbed-Inventory.xlsx&action=default&mobileredirect=true EJFAT Config Planning] | ||
+ | |||
+ | [https://www.jlab.org/news/releases/california-streamin-jefferson-lab-esnet-achieve-coast-coast-feed-real-time-physics JLab EJFAT News Release] | ||
+ | |||
+ | [https://jeffersonlab.sharepoint.com/:i:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/JIRIAF%20on%20FABRIC.png?csf=1&web=1&e=TOGEPr EJFAT on FABRIC] | ||
+ | |||
+ | [https://jeffersonlab.sharepoint.com/:b:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/E2SAR.drawio.pdf?csf=1&web=1&e=E0Uqlh EJFAT API] | ||
+ | |||
+ | [https://docs.google.com/document/d/1ssw8sye7jExtPCJVejloe8hNkyWOcxEQzVmm45xs5-w/edit#heading=h.b8k68ix2wf30 LB Pipeline] | ||
+ | |||
+ | [https://docs.google.com/document/d/1qEo51MZeUPM3-DA2CK6jAccrU0r1QtPfl5i3aPS2SKM/edit?exids=71471482,71471477#heading=h.69350544ggm5 Getting Started with EJFAT] | ||
+ | |||
+ | [https://jeffersonlab.sharepoint.com/:w:/r/sites/ITDivision/proposals/_layouts/15/Doc.aspx?sourcedoc=%7B33ffd720-9356-471f-8880-b0c56c5593a5%7D&action=view&wdAccPdf=0&wdparaid=39A41B49 IRIAD Workplan] | ||
+ | |||
+ | [https://wiki.jlab.org/epsciwiki/index.php/SRO_Grand_Challenge SRO Grand Challenge] | ||
+ | |||
+ | [https://my.es.net/?_gl=1*pchcca*_ga*MjAyODE5NDE3OC4xNzEwOTYwMDI4*_ga_9Y9H16804B*MTcxMDk2MDAyOC4xLjAuMTcxMDk2MDAyOC4wLjAuMA..&s=JLAB&st=esnet_site ESnet Logical Map] | ||
+ | |||
+ | [http://linux-ip.net/html/tools-ip-neighbor.html IP Neighbor] | ||
+ | |||
+ | [https://robotframework.org/robotframework/latest/RobotFrameworkUserGuide.html Robot Framework] | ||
+ | |||
+ | [https://science.osti.gov/-/media/ascr/ascac/pdf/meetings/202306/Brown_IRI_ASCAC_2023206.pdf IRI Vision] | ||
+ | |||
+ | [https://arxiv.org/pdf/2111.05155 A horizontally scalable online processing system for trigger-less data acquisition] | ||
+ | |||
+ | [https://arxiv.org/pdf/2212.11032 The-triggerless-data-acquisition-system-of-the-XENONnT-experiment] | ||
+ | |||
+ | [https://indico.cern.ch/event/783429/contributions/3378959/attachments/1829959/2996545/khennessy_cepc_dune_daq_v1.pdf DUNE triggerless DAQ] | ||
+ | |||
+ | [https://indico.jlab.org/event/378/contributions/6050/attachments/5093/6351/20200513_JLab_Streaming_Readout.pdf Streaming Mode DAQ at JLab] | ||
+ | |||
+ | [http://www.scholarpedia.org/article/Real-time_data_analysis_in_particle_physics Real-time data analysis in particle physics] | ||
+ | |||
+ | [https://indico.cern.ch/event/659612/contributions/2690262/attachments/1591386/2518642/triggerintro4.pdf Intro to Triggering] | ||
+ | |||
+ | [https://wiki.jlab.org/epsciwiki/images/8/8b/SRO_LDRD_Test_Plan_2024v0.8.pdf SRO Test Plan] | ||
+ | |||
+ | == Edge to Core Test Equipment: == | ||
+ | |||
+ | # [https://jeffersonlab.sharepoint.com/:x:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/Edge-to-Core-Test-Stand-12102021.xlsx?d=w8de06c441cd442fd8d3f1b7d7983028d&csf=1&web=1&e=wKS9Lh Price Estimate Spreadsheet] | ||
+ | # [https://jeffersonlab.sharepoint.com/:b:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/EJFAT-Test-Stand-Network-Map.pdf?csf=1&web=1&e=iWvvet Networking Diagram], [[Media:20240209_EJFAT_diagram.pdf | Updated (PDF)]] (from Brent 2024-02-09) | ||
+ | # [https://misportal.jlab.org/reqs/pr/viewPr.do?prNum=408549 PR408549] : Requisition 1 of 2 : | ||
+ | ## [https://jeffersonlab.sharepoint.com/:w:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/EJFAT-Test-Stand-Servers-SOW.docx?d=w19107d52332948a0b2924b13939c3f64&csf=1&web=1&e=CS6Ub8 Statement of Work for Servers] | ||
+ | ## 1/13/2022: EJFAT team decided to solicit two bid responses, one with MLX NIC and one without. Response from Procurement is "I can ask for the two separate quotes. If you are going to purchase both option (with & without add-in cards), once I receive the quotes back, you will have submit a new PR to cover the option (without add-in cards)." | ||
+ | ## 1/18/2022: Question from KOI Computers: "please clarify what the part number for the NVIDIA Dual Port ConnectX-6". Replied with part # MCX623106AN-CDAT. | ||
+ | ## 1/24/2022: Requisition currently open for bid responses from vendors. Due date is COB 1/24/2022. | ||
+ | ## 1/27/2022: PO awarded to Atipa for 6 servers and 1 file-server with FPGA and MLX SmartNIC. Expected delivery date from vendor is 5/31/2022. | ||
+ | # [https://misportal.jlab.org/reqs/pr/viewPr.do?prNum=408870 PR408870] [https://misportal.jlab.org/reqs/pr/viewPr.do?prNum=408938 PR408938] Requisition 2 of 2: Statement of Work for Switches & Cables | ||
+ | ## 1/14/2022: PRs for the switches, transceivers and fiber have been submitted. I added (4) 2km 100G transceivers to support dual 100G connections between the switches. We can always upgrade to 400G in the future, if needed. | ||
+ | # [https://misportal.jlab.org/reqs/pr/viewPr.do?prNum=409850 PR409850] [https://developer.nvidia.com/arm-hpc-devkit NVIDIA ARM HPC Developer Kit] | ||
+ | ## Hardware Specifications for dev kit | ||
+ | ##: [[Model]] GIGABYTE G242-P32, 2U server | ||
+ | ##: [[CPU]] 1x Ampere Altra Q80-30 (Arm processor) | ||
+ | ##: [[Memory]] 512G DDR4 memory | ||
+ | ##: [[Storage]] 6TB SAS/ SATA 3.5″ | ||
+ | ##: [[GPU]] 2x NVIDIA A100 GPU | ||
+ | ##: [[Network]] 2x NVIDIA® BlueField®-2 E-Series DPU, 200GbE/HDR single-port QSFP56, PCIe Gen4 x16, secure boot enabled, crypto disabled, 16GB on-board DDR, 1GbE OOB management | ||
== Resources == | == Resources == | ||
− | * [https:// | + | * [https://jeffersonlab.sharepoint.com/:b:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/u280_po_Signed_21-M0862%20-%20Avnet.pdf?csf=1&web=1&e=PmJfdu First FPGA PO] |
+ | * [https://www.jlab.org TBD] |
Latest revision as of 20:52, 19 December 2024
(ESnet / JLaB FPGA Accelerated Transport)
EJFAT is a collaboration between Energy Sciences Network (ESnet) and Thomas Jefferson National Laboratory (JLab) for proof of concept engineering for accelerated load balancer (LB) using dynamic IP4/6 address forwarding. Dynamic because the forwarding address is chosen dynamically from a collection of destination endpoints based on near real-time destination workload conditions, and accelerated because the forwarding is accomplished with low fixed latency at line rates of up to 200Gbps per FPGA, where in general a functioning LB may consist of up to four FPGAs acting as one logical DP for a total bandwidth capacity of over 1 Tbps. The low, fixed latency is achieved by utilization of an appropriately programmed Field Programmable Gate Array (FPGA) to effect the Data Plane (DP) functions of the LB.
EJFAT System Status
ejfat-1
- 100Gbps NIC: ejfat-1-daq 129.57.177.8
- 10Gbps NIC: ejfat-1 129.57.177.131
- U280 FPGA: ejfat-1-dp 129.57.177.{9-16} - LAG'd for 200Gbps
- LB CP: ejfat-1 129.57.177.131, latest Stable branch
- LB: DP latest Stable FW
- CP Web UI port 8081
ejfat-2
- 100Gbps NIC: ejfat-2-daq 129.57.177.2
- 10Gbps NIC: ejfat-2 129.57.177.132
- 100Gbps U280 FPGA: ejfat-2-dp 129.57.177.{17-24}
- LB CP: ejfat-2 129.57.177.132, latest Stable branch
- LB: DP latest Stable FW
- CP Web UI port 8082
ejfat-3
- 200Gbps NIC: ejfat-3-daq 129.57.177.3
- 10Gbps NIC: ejfat-3 129.57.177.133
- Two U280s installed - LAG'd for 400Gbps
- FW Containers built by Stacey
ejfat-4
- 100Gbps NIC: ejfat-4-daq 129.57.177.4
- 10Gbps NIC: ejfat-4 129.57.177.134
- XDP experiments
- 100Gbps U280 FPGA: ejfat-4-dp 129.57.177.{41-48}
- LB CP: ejfat-4 129.57.177.134,
latest Stable branch - LB: DP
latest Stable FW
ejfat-5
- 200Gbps NIC: ejfat-5-daq 129.57.177.5
- 10Gbps NIC: ejfat-5 129.57.177.135
- LB CP: ejfat-5 129.57.177.135,
latest Stable branch - 100Gbps U280 FPGA: ejfat-5-dp 129.57.177.{49-56}
- LB: DP
latest Stable FW - Optical Taps Installed
ejfat-6
- 200Gbps NIC: ejfat-6-daq 129.57.177.6
- 10Gbps NIC: ejfat-6 129.57.177.136
- DAOS experiments
- Using Ubuntu 24.04 LTS
- FW containers built
- Waiting for podman compose installation
ejfat-fs
- 100Gbps NIC: ejfat-fs-daq 129.57.177.7
- 10Gbps NIC: ejfat-fs 129.57.177.130
- Hosts NVME memory/disk
- 100Gbps U280 FPGA: ejfat-fs-dp 129.57.177.{65-72}
- LB CP: ejfat-fs 129.57.177.130, latest Stable branch
- LB: DP latest Stable FW
- CP Web UI port 8080
Presentations/Papers
date | presenter | Event | links |
---|---|---|---|
2021-03-01 | G. Heyes | EJFAT Proposal | Word |
2021-10-21 | M. S. Goodrich | Div Brief | |
2021-11-05 | M. S. Goodrich | Canisius College | |
2021-12-03 | S. Sheldon | ESnet LB Tutorial | MP4 |
2021-12-10 | Y. Kumar | SRO iX Presentation | PPTX |
2022-08-05 | M. S. Goodrich | RT-2022 Presentation | PPTX |
2022-08-05 | M. S. Goodrich, et al. | RT-2022 Proceedings | |
2022-10-20 | S. Sheldon, et al. | INDIS-2022 | |
2022-10-24 | M. S. Goodrich | ACAT-2022 Presentation | PPTX |
2023-03-17 | M. S. Goodrich, et al. | ACAT-2022 Proceedings | |
2023-05-11 | M. S. Goodrich, et al. | CHEP-2023 Presentation | PPTX |
2023-10-12 | D. Howard, et al. | CHEP-2023 Conference Publication | |
2024-03-11 | M. S. Goodrich, et al. | ACAT-2024 Presentation | PPTX |
2024-04-10 | M. S. Goodrich, et al. | RT-2024 Presentation | PPTX |
2024-07-31 | M. S. Goodrich, et al. | ACAT-2024 Proceedings | |
2024-10-02 | S. Veseli, APS/SDM | APS/ALS - EJFAT | PPTX |
EJFAT Weekly EPSCI Meetings
EJFAT Weekly Collaboration Meetings
Technical Design Overview
EJFAT Technical Design Overview
UDP Transmission Performance
EJFAT UDP General Performance Considerations
EJFAT UDP Packet Receiving and Core Switching
EJFAT UDP Packet Sending and NUMA Nodes
EJFAT UDP Single Thread Packet Sending and Receiving
Testing Load Balancer Bandwidth
HOW-TOs
How to use Control Plane Web UI
How to install, build and use gRPC
How to install, build and use XDP related packages
How to Compute Schedule Density from PID Signals
Network Path MTU Discovery support in the Linux Kernel:
file: /proc/sys/net/ipv4/tcp_mtu_probing variable: net.ipv4.tcp_mtu_probing (integer; default: 0; since Linux 2.6.17): tcp_mtu_probing - INTEGER Controls TCP Packetization-Layer Path MTU Discovery. Takes three values: 0 - Disabled 1 - Disabled by default, enabled when an ICMP black hole detected 2 - Always enabled, use initial MSS of tcp_base_mss.
REFERENCEs
A horizontally scalable online processing system for trigger-less data acquisition
The-triggerless-data-acquisition-system-of-the-XENONnT-experiment
Real-time data analysis in particle physics
Edge to Core Test Equipment:
- Price Estimate Spreadsheet
- Networking Diagram, Updated (PDF) (from Brent 2024-02-09)
- PR408549 : Requisition 1 of 2 :
- Statement of Work for Servers
- 1/13/2022: EJFAT team decided to solicit two bid responses, one with MLX NIC and one without. Response from Procurement is "I can ask for the two separate quotes. If you are going to purchase both option (with & without add-in cards), once I receive the quotes back, you will have submit a new PR to cover the option (without add-in cards)."
- 1/18/2022: Question from KOI Computers: "please clarify what the part number for the NVIDIA Dual Port ConnectX-6". Replied with part # MCX623106AN-CDAT.
- 1/24/2022: Requisition currently open for bid responses from vendors. Due date is COB 1/24/2022.
- 1/27/2022: PO awarded to Atipa for 6 servers and 1 file-server with FPGA and MLX SmartNIC. Expected delivery date from vendor is 5/31/2022.
- PR408870 PR408938 Requisition 2 of 2: Statement of Work for Switches & Cables
- 1/14/2022: PRs for the switches, transceivers and fiber have been submitted. I added (4) 2km 100G transceivers to support dual 100G connections between the switches. We can always upgrade to 400G in the future, if needed.
- PR409850 NVIDIA ARM HPC Developer Kit
- Hardware Specifications for dev kit
- Model GIGABYTE G242-P32, 2U server
- CPU 1x Ampere Altra Q80-30 (Arm processor)
- Memory 512G DDR4 memory
- Storage 6TB SAS/ SATA 3.5″
- GPU 2x NVIDIA A100 GPU
- Network 2x NVIDIA® BlueField®-2 E-Series DPU, 200GbE/HDR single-port QSFP56, PCIe Gen4 x16, secure boot enabled, crypto disabled, 16GB on-board DDR, 1GbE OOB management
- Hardware Specifications for dev kit