Difference between revisions of "EJFAT Group Meeting Nov 21, 2024"

From epsciwiki
Jump to navigation Jump to search
 
(7 intermediate revisions by the same user not shown)
Line 38: Line 38:
 
## Mike PD Duty 11/06/2024 - 11/20/2024
 
## Mike PD Duty 11/06/2024 - 11/20/2024
 
# Topics
 
# Topics
 +
## <s>Event splitting being investigated with E2SAR debug tools</s>
 
## [https://docs.google.com/document/d/1CsBtDZEhK4k9POSeiLF4kzTQVMl7KwZtubVH-aEIyo4/edit IRI Test Development]:  
 
## [https://docs.google.com/document/d/1CsBtDZEhK4k9POSeiLF4kzTQVMl7KwZtubVH-aEIyo4/edit IRI Test Development]:  
 
### [https://jeffersonlab.sharepoint.com/:i:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/iri-11-20-24.png?csf=1&web=1&e=qQq1UN Last Test Wednesday Nov 20]
 
### [https://jeffersonlab.sharepoint.com/:i:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/iri-11-20-24.png?csf=1&web=1&e=qQq1UN Last Test Wednesday Nov 20]
Line 59: Line 60:
 
## E2SAR 0.1.2
 
## E2SAR 0.1.2
 
### segmentation/reassembly complete
 
### segmentation/reassembly complete
### .deb packages for Ubuntu 20, 22 and 24 are now available (they contain E2SAR library, headers, executables as well as appropriate versions of gRPC and Boost dependencies, all installed under /usr/local), as well as the latest Docker image
+
### .deb packages for Ubuntu 20, 22 and 24  
 
## Experiment Halls - beam returns late January/February 2025
 
## Experiment Halls - beam returns late January/February 2025
 
## Ubuntu 20.04 LTS - support ends in 2025 - next ESnet target 22.04
 
## Ubuntu 20.04 LTS - support ends in 2025 - next ESnet target 22.04
 +
## <s>FW containers need boot init script</s>
 
# [https://wiki.jlab.org/epsciwiki/index.php/EJFAT Status]
 
# [https://wiki.jlab.org/epsciwiki/index.php/EJFAT Status]
 
## ejfat-1 - 2-port LAG at switch
 
## ejfat-1 - 2-port LAG at switch
## ejfat-2 - Currently shadowing ESnet Stable deployment for IRI
+
## ejfat-2
## ejfat-3 - two FPGA DP built, running - 4-port LAG at switch, FW containers built (Stacey), needs CP installation
+
### Currently shadowing ESnet Stable deployment for IRI
## ejfat-6 - Ubuntu 24.04 installed - esnet-smartnic-fw build succeeds with podman, issues with podman compose
+
### Needs updated Installation Procedure
## /dev/sdb is now mounted on all systems at /scratch, with the exception of EJFAT-4.
+
## ejfat-3
 +
### two FPGA DP built
 +
### FW containers built Stacey
 +
### Needs Installation Procedure
 +
### 4-port LAG at switch
 +
### needs CP installation
 +
## ejfat-6
 +
### Ubuntu 24.04 installed
 +
### esnet-smartnic-fw build succeeds with podman
 +
### issues with podman compose
 
# EJFAT Phase II
 
# EJFAT Phase II
 +
##Architecture change in control/data paths for FPGA (SRIOV)
 +
##Adding PCIE AES
 
# AOT
 
# AOT
 +
 +
=== Notes ===
 +
#LLDP needs IOMMU
 +
#EJFAT nodes:
 +
##16 NUMA domains
 +
##DPDK must run portmode driver on CPU in NUMA domain of FPGA for LLDP messages
  
 
=== Minutes ===
 
=== Minutes ===
 +
# ESnet CONFAB event, which runs from April 7 to 11. 
 +
## EJFAT developer meeting all day Thursday 10th
 +
## April 10th 2025 in San Francisco

Latest revision as of 16:26, 27 November 2024

The meeting time is 11:00am Eastern/USA.

Connection Info:

You can connect using [ https://jlab-org.zoomgov.com/j/1611828967?pwd=UVVCS0pUVW5FMlphT0lRQXdoQ0o4Zz09&from=addon ZoomGov Video conferencing (ID: 161 012 5238)]. (Click "Expand" to the right for details -->):

Meeting URL
 https://jlab-org.zoomgov.com/j/1611828967

Meeting ID
161 182 8967

Passcode
570041

Want to dial in from a phone?

Dial one of the following numbers:
US: +1 669 254 5252 or +1 646 828 7666 or +1 551 285 1373 or +1 669 216 1590 or 833 568 8864 (Toll Free)

Enter the meeting ID and passcode followed by #

Connecting from a room system?
Dial: bjn.vc or 199.48.152.152 and enter your meeting ID & passcode


Agenda:

  1. Previous meeting
  2. Announcements:
    1. SuperComputing24 Atlanta, GA from Nov 17-22, 2024
    2. Mike PD Duty 11/06/2024 - 11/20/2024
  3. Topics
    1. Event splitting being investigated with E2SAR debug tools
    2. IRI Test Development:
      1. Last Test Wednesday Nov 20
      2. Next Test Date/Time = ?????
      3. Data Source:
        1. JLAB, CLAS12, pre-triggered events - 1 channel
      4. Data Sink:
        1. Perlmutter - 80 nodes
        2. ORNL/ESnet/JLab IRI Testbed / Defiant - 4 nodes allocated
        3. JLab - 7 nodes available
        4. FABRIC - nodes available
        5. ERSAP
      5. Test Plans - JLab, ESnet, NERSC:
    3. ALS: Next is Tech Talk
    4. JLab FEG/SRO
      1. will use interim UDP solution for event sync
      2. Special Events Issue - Completely Out-of-band
        1. Cloud Based message queue
        2. LB isolation from any non-LB processing
        3. Cloud solutions include Kafka and RabbitMQ, ...
    5. E2SAR 0.1.2
      1. segmentation/reassembly complete
      2. .deb packages for Ubuntu 20, 22 and 24
    6. Experiment Halls - beam returns late January/February 2025
    7. Ubuntu 20.04 LTS - support ends in 2025 - next ESnet target 22.04
    8. FW containers need boot init script
  4. Status
    1. ejfat-1 - 2-port LAG at switch
    2. ejfat-2
      1. Currently shadowing ESnet Stable deployment for IRI
      2. Needs updated Installation Procedure
    3. ejfat-3
      1. two FPGA DP built
      2. FW containers built Stacey
      3. Needs Installation Procedure
      4. 4-port LAG at switch
      5. needs CP installation
    4. ejfat-6
      1. Ubuntu 24.04 installed
      2. esnet-smartnic-fw build succeeds with podman
      3. issues with podman compose
  5. EJFAT Phase II
    1. Architecture change in control/data paths for FPGA (SRIOV)
    2. Adding PCIE AES
  6. AOT

Notes

  1. LLDP needs IOMMU
  2. EJFAT nodes:
    1. 16 NUMA domains
    2. DPDK must run portmode driver on CPU in NUMA domain of FPGA for LLDP messages

Minutes

  1. ESnet CONFAB event, which runs from April 7 to 11. 
    1. EJFAT developer meeting all day Thursday 10th
    2. April 10th 2025 in San Francisco