Difference between revisions of "EJFAT EPSCI Meeting Oct. 23, 2024"

From epsciwiki
Jump to navigation Jump to search
Line 27: Line 27:
 
#:
 
#:
 
# Announcements:
 
# Announcements:
 +
## SuperComputing24 Atlanta, GA from Nov 17-22, 2024
 
# [https://wiki.jlab.org/epsciwiki/index.php/EJFAT Status]
 
# [https://wiki.jlab.org/epsciwiki/index.php/EJFAT Status]
 
# Topics
 
# Topics

Revision as of 18:25, 23 October 2024

The meeting time is 2:30pm.

Connection Info:


Agenda:

  1. Previous meeting
  2. Announcements:
    1. SuperComputing24 Atlanta, GA from Nov 17-22, 2024
  3. Status
  4. Topics
    1. Docker containers on reboot
    2. IRI Test Development:
      1. Last Test Thursday Oct 10
      2. LB version = ESnet ??? version
      3. Data Source:
        1. JLAB, CLAS12, pre-triggered events - 1 channel
      4. Data Sink:
        1. Perlmutter - 40 nodes
        2. ORNL/ESnet/JLab IRI Testbed / Defiant - 4 nodes allocated
        3. JLab - 7 nodes available
        4. FABRIC - nodes available
        5. ERSAP
      5. Test Plans - JLab, ESnet, NERSC:
      6. Prometheus Dashboards
      7. The Prometheus dashboard can be accessed on port 1717 of the ejfat-fs node. The test data is located at "100g-nersc-ornl / ejfat-nersc-ornl". The test time interval is around UTC 17:05 to 18:20 on August 29, 2024. To log in to Grafana, please use the username and password "ejfat".
    3. JLab FEG/SRO
      1. will use interim UDP solution for event sync
      2. Special Events Issue - Completely Out-of-band
        1. Cloud Based message queue
        2. LB isolation from any non-LB processing
    4. E2SAR 0.1.2
      1. ejfat-5 reserved for E2SAR
      2. segmentation/reassembly complete
      3. .deb packages for Ubuntu 20, 22 and 24 are now available (they contain E2SAR library, headers, executables as well as appropriate versions of gRPC and Boost dependencies, all installed under /usr/local), as well as the latest Docker image
    5. Special Events: Site Issue: Cloud solutions include CAFCA, RabbitMsg, ...
    6. ALS: Next is Tech Talk
    7. IB
    8. ejfat-3 - two FPGA DP built, running - with 4-port LAG at switch, needs CP installation
    9. ejfat-6 - Ubuntu 24.04 installed - esnet-smartnic-fw build succeeds with podman, issues with podman compose
    10. SC poster submitted - demo in works
    11. Storage
      1. SSD drives on ejfat-fs - 20TB used of 28TB - mounted for EJFAT farm
      2. Ram Disks: 1TB Total Mem on ejfat-fs, 0.5 TB others
      3. Repurposing /dev/sdb to be used for user storage
      4. Storage Areas NOT to be backed up could be marked as scratch
      5. Have an opportunity to consolidate wares on SSD for consistent SC backup procedure ?.
    12. Experiment Halls - beam returns late January/February 2025
    13. Ubuntu 20.04 LTS - support ends in 2025 - next ESnet target 22.04
    14. CP: Control Web UI (127.0.0.1) needs SSH tunnel
    15. EJFAT II: New CP-DP APIs for config available for current current FW (?)
    16. ESnet interested in partnering for beachhead in FPGA/GPU AI space
      1. Separate Project
      2. FAST program coming up
      3. May get free help from Xilinx
      4. Might target VERSA release
  5. Resources:
    1. HPDF
    2. EJFAT API
    3. EJFAT Status
  6. U280’s are discontinued.
    1. New LB purchases U55C
    2. U55C bitfiles available 1 year out
    3. U280 Supported indefinitely
  7. ACAT 2024 Paper - In Review
  8. AOT

Minutes