Difference between revisions of "EJFAT EPSCI Meeting Dec. 18, 2024"

From epsciwiki
Jump to navigation Jump to search
(Created page with "The meeting time is 2:30pm. === Connection Info: === <div class="toccolours mw-collapsible mw-collapsed"> You can connect using [https://teams.microsoft.com/l/meetup-join/19%...")
 
 
(3 intermediate revisions by the same user not shown)
Line 27: Line 27:
 
#:
 
#:
 
# Announcements:
 
# Announcements:
 +
## [https://indico.cern.ch/event/1330797/papers/5796662/ ACAT2024 Paper Accepted]
 
## ESnet CONFAB event, which runs from April 7 to 11.   
 
## ESnet CONFAB event, which runs from April 7 to 11.   
 
### EJFAT developer meeting all day Thursday 10th
 
### EJFAT developer meeting all day Thursday 10th
 
### April 10th 2025 in San Francisco
 
### April 10th 2025 in San Francisco
 
## SkuTech Interest in EJFAT
 
## SkuTech Interest in EJFAT
 +
### JLab supplied Letter of Support for pre-proposal for SBIR Phase II
 
# Topics
 
# Topics
 
## '''Local CP testing'''
 
## '''Local CP testing'''
Line 43: Line 45:
 
#### ERSAP
 
#### ERSAP
 
### [https://docs.google.com/document/d/13VvyCMNJW3nIVZMgqOuPn3MBSLmfAl1zLkJAHw8fj04/edit?usp=drivesdk Test Plans - JLab, ESnet, NERSC:]
 
### [https://docs.google.com/document/d/13VvyCMNJW3nIVZMgqOuPn3MBSLmfAl1zLkJAHw8fj04/edit?usp=drivesdk Test Plans - JLab, ESnet, NERSC:]
### Prometheus Dashboards
+
### [https://github.com/JeffersonLab/E2SAR/blob/main/scripts/notebooks/EJFAT/E2SAR-U280-lb.ipynb E2SAR Integration]
### The Prometheus dashboard can be accessed on port 1717 of the ejfat-fs node. The test data is located at "100g-nersc-ornl / ejfat-nersc-ornl". The test time interval is around UTC 17:05 to 18:20 on August 29, 2024. To log in to Grafana, please use the username and password "ejfat".
 
## E2SAR Integration
 
 
## JLab FEG/SRO
 
## JLab FEG/SRO
 
### will use interim UDP solution for event sync
 
### will use interim UDP solution for event sync
Line 51: Line 51:
 
#### Cloud message queue solutions include Kafka and RabbitMQ, ...
 
#### Cloud message queue solutions include Kafka and RabbitMQ, ...
 
#### LB isolation from any non-LB processing
 
#### LB isolation from any non-LB processing
## E2SAR 0.1.4
+
## [https://github.com/JeffersonLab/E2SAR/ E2SAR 0.1.4]
 
### segmentation/reassembly complete
 
### segmentation/reassembly complete
 
### .deb packages for Ubuntu 20, 22 and 24 are now available (they contain E2SAR library, headers, executables as well as appropriate versions of gRPC and Boost dependencies, all installed under /usr/local), as well as the latest Docker image  
 
### .deb packages for Ubuntu 20, 22 and 24 are now available (they contain E2SAR library, headers, executables as well as appropriate versions of gRPC and Boost dependencies, all installed under /usr/local), as well as the latest Docker image  
Line 80: Line 80:
 
## U280 Supported indefinitely
 
## U280 Supported indefinitely
 
# [https://www.overleaf.com/project/667d9fa6b50f340b46026ba3 ACAT 2024 Paper] - In Review
 
# [https://www.overleaf.com/project/667d9fa6b50f340b46026ba3 ACAT 2024 Paper] - In Review
 +
 
=== Notes ===
 
=== Notes ===
 
# LLDP needs IOMMU
 
# LLDP needs IOMMU

Latest revision as of 15:15, 19 December 2024

The meeting time is 2:30pm.

Connection Info:


Agenda:

  1. Previous meeting
  2. Announcements:
    1. ACAT2024 Paper Accepted
    2. ESnet CONFAB event, which runs from April 7 to 11. 
      1. EJFAT developer meeting all day Thursday 10th
      2. April 10th 2025 in San Francisco
    3. SkuTech Interest in EJFAT
      1. JLab supplied Letter of Support for pre-proposal for SBIR Phase II
  3. Topics
    1. Local CP testing
    2. IRI Test Development:
      1. Data Source:
        1. JLAB, CLAS12, pre-triggered events - 1 channel
      2. Data Sink:
        1. Perlmutter - 80 nodes
        2. ORNL/ESnet/JLab IRI Testbed / Defiant - 4 nodes allocated
        3. JLab - 7 nodes available
        4. FABRIC - nodes available
        5. ERSAP
      3. Test Plans - JLab, ESnet, NERSC:
      4. E2SAR Integration
    3. JLab FEG/SRO
      1. will use interim UDP solution for event sync
      2. Special Events Issue - Completely Out-of-band
        1. Cloud message queue solutions include Kafka and RabbitMQ, ...
        2. LB isolation from any non-LB processing
    4. E2SAR 0.1.4
      1. segmentation/reassembly complete
      2. .deb packages for Ubuntu 20, 22 and 24 are now available (they contain E2SAR library, headers, executables as well as appropriate versions of gRPC and Boost dependencies, all installed under /usr/local), as well as the latest Docker image
    5. ALS: E2SAR integration
    6. IB
    7. Storage
      1. SSD drives on ejfat-fs - 20TB used of 28TB - mounted for EJFAT farm - permissions issue
      2. Ram Disks: 1TB Total Mem on ejfat-fs, 0.5 TB others
      3. Repurposing /dev/sdb to be used for user storage
      4. Storage Areas NOT to be backed up could be marked as scratch
      5. Have an opportunity to consolidate wares on SSD for consistent SC backup procedure ?.
    8. Experiment Halls - beam returns late January/February 2025
    9. Ubuntu 20.04 LTS - support ends in 2025 - next ESnet target 22.04
    10. CP: Control Web UI (127.0.0.1:8081) needs SSH tunnel
    11. EJFAT II
    12. ESnet interested in partnering for beachhead in FPGA/GPU AI space
      1. Separate Project
      2. FAST program coming up
      3. May get free help from Xilinx
      4. Might target VERSA release
  4. Resources:
    1. HPDF
    2. EJFAT API
    3. EJFAT Status
  5. U280’s are discontinued.
    1. New LB purchases U55C
    2. U55C bitfiles available 1 year out
    3. U280 Supported indefinitely
  6. ACAT 2024 Paper - In Review

Notes

  1. LLDP needs IOMMU
  2. FW containers need boot init script
  3. EJFAT nodes:
    1. 16 NUMA domains
    2. DPDK must run portmode driver on CPU in NUMA domain of FPGA for LLDP messages
  4. EJFAT II
    1. architecture change in control/data paths for FPGA (SRIOV)
    2. adding PCIE AES
  5. AOT

Minutes