Difference between revisions of "EJFAT Group Meeting Oct 24, 2024"

From epsciwiki
Jump to navigation Jump to search
 
(One intermediate revision by the same user not shown)
Line 73: Line 73:
 
=== Minutes ===
 
=== Minutes ===
 
# LLDP needs IOMMU
 
# LLDP needs IOMMU
# FW containers need on boot init script
+
# FW containers need boot init script (Stacey)
# Mark Jones to discuss LB Docker login blockage
+
# Mark Jones to discuss LB Docker login blockage (OBE?)
 
# EJFAT nodes:
 
# EJFAT nodes:
 
## 16 NUMA domains
 
## 16 NUMA domains
 
## DPDK must run portmode driver on CPU in NUMA domain of FPGA for LLDP messages
 
## DPDK must run portmode driver on CPU in NUMA domain of FPGA for LLDP messages
# Event splitting using E2SAR debug tools
+
# Event splitting being investigated with E2SAR debug tools
 
# EJFAT II  
 
# EJFAT II  
 
## architecture change in control/data paths for FPGA (SRIOV)
 
## architecture change in control/data paths for FPGA (SRIOV)
 
## adding PCIE AES
 
## adding PCIE AES

Latest revision as of 14:37, 7 November 2024

The meeting time is 11:00am Eastern/USA.

Connection Info:

You can connect using [ https://jlab-org.zoomgov.com/j/1611828967?pwd=UVVCS0pUVW5FMlphT0lRQXdoQ0o4Zz09&from=addon ZoomGov Video conferencing (ID: 161 012 5238)]. (Click "Expand" to the right for details -->):

Meeting URL
 https://jlab-org.zoomgov.com/j/1611828967

Meeting ID
161 182 8967

Passcode
570041

Want to dial in from a phone?

Dial one of the following numbers:
US: +1 669 254 5252 or +1 646 828 7666 or +1 551 285 1373 or +1 669 216 1590 or 833 568 8864 (Toll Free)

Enter the meeting ID and passcode followed by #

Connecting from a room system?
Dial: bjn.vc or 199.48.152.152 and enter your meeting ID & passcode


Agenda:

  1. Previous meeting
  2. Announcements:
    1. SuperComputing24 Atlanta, GA from Nov 17-22, 2024
    2. This meeting now bi-weekly.
  3. Topics
    1. Docker containers on reboot
    2. IRI Test Development:
      1. Last Test Thursday Oct 10
      2. LB version = ESnet ??? version
      3. Data Source:
        1. JLAB, CLAS12, pre-triggered events - 1 channel
      4. Data Sink:
        1. Perlmutter - 80 nodes
        2. ORNL/ESnet/JLab IRI Testbed / Defiant - 4 nodes allocated
        3. JLab - 7 nodes available
        4. FABRIC - nodes available
        5. ERSAP
      5. Test Plans - JLab, ESnet, NERSC:
    3. ALS: Next is Tech Talk
    4. JLab FEG/SRO
      1. will use interim UDP solution for event sync
      2. Special Events Issue - Completely Out-of-band
        1. Cloud Based message queue
        2. LB isolation from any non-LB processing
        3. Cloud solutions include Kafka and RabbitMQ, ...
    5. E2SAR 0.1.2
      1. segmentation/reassembly complete
      2. .deb packages for Ubuntu 20, 22 and 24 are now available (they contain E2SAR library, headers, executables as well as appropriate versions of gRPC and Boost dependencies, all installed under /usr/local), as well as the latest Docker image
    6. SC poster submitted - demo in works
    7. Experiment Halls - beam returns late January/February 2025
    8. Ubuntu 20.04 LTS - support ends in 2025 - next ESnet target 22.04
  4. Status
    1. ejfat-1 - 2-port LAG at switch
    2. ejfat-3 - two FPGA DP built, running - 4-port LAG at switch, FW containers built (Stacey), needs CP installation
    3. ejfat-6 - Ubuntu 24.04 installed - esnet-smartnic-fw build succeeds with podman, issues with podman compose
  5. EJFAT Phase II
  6. AOT

Minutes

  1. LLDP needs IOMMU
  2. FW containers need boot init script (Stacey)
  3. Mark Jones to discuss LB Docker login blockage (OBE?)
  4. EJFAT nodes:
    1. 16 NUMA domains
    2. DPDK must run portmode driver on CPU in NUMA domain of FPGA for LLDP messages
  5. Event splitting being investigated with E2SAR debug tools
  6. EJFAT II
    1. architecture change in control/data paths for FPGA (SRIOV)
    2. adding PCIE AES