EJFAT EPSCI Meeting Mar. 27, 2024

From epsciwiki
Jump to navigation Jump to search

The meeting time is 2:30pm.

Connection Info:


Agenda:

  1. Previous meeting
  2. Announcements:
  3. 24th Real Time ConferenceICISE, Quy Nhon, Vietnam 22-26 April 2024 - Selected for 20 min Oral
    1. Show how EJFAT architecture contrasts with current
    2. Explain EJFAT's benefits to experimentalist
    3. Propose to measure/plot EJFAT event delivery elasticity/scaling through reassembly/reconstruction steps
    4. Use JLab-ESnet-NERSC/ORNL/JLab environment
  4. Status:
    1. 100Gig traffic to ESnet LB
    2. EJFAT LBs: ejfat-2, ejfat-5, 192.188.29.16 (ESnet) not on current CP
    3. Plan to upgrade ejfat-2 to current CP, new iLB install on ejfat-6
    4. All EJFAT nodes have 48GB ramdisk, (ejfat-fs = 96GB)
  5. Cluster Activities:
    1. ejfat-1: Test moving a current LB install wholesale by moving files/containers
    2. ejfat-2: June 2023 LB - open for business
    3. ejfat-3: Ready to accept second FPGA for experimentation/install by ESnet
    4. ejfat-4: Carl's XDP experiments
    5. ejfat-5: Feb 2024 LB - latest production
    6. ejfat-6: Cissie's DAOS experiments / Mar 2024 LB - latest development
    7. ejfat-fs: Hosts NVME memory/disk
  6. NERSC Test Development:
    1. Data Source:
      1. JLAB, CLAS12, pre-triggered events - 1 channel
    2. Data Sink:
      1. Perlmutter
      2. ERSAP
    3. Networking for Test
      1. Currently 2 x 10 Gbps for JLab/L3 VPN - will expand to 200Gbps in near fututure
    4. Test Plans - JLab, ESnet, NERSC:
  7. ORNL/ESnet/JLab IRI Testbed (similar to NERSC) - Ross Miller <rgmiller@ornl.gov> project code CSC 266
  8. Hall B CLAS12 detector streaming test
    1. Switch 7050 is expected to arrive some time around October; we have already transceivers, short cables and patch panel to connect up to 32 VTPs to it using two 10GBit links per VTP
    2. Fiber installation between hallb forward carriage and hallb counting room should be done this summer, will be enough for 24 VTPs using two 10GBit links per VTP
    3. We have only one fiber between hallb counting room and counting house second floor available right now, will order more fibers installation, may take several months
    4. There are several available fibers between counting house second floor and computer center (like 6), we can use a couple of them for our test
    5. Summary: sometime in October, we should have 48 10GBit links from 24 VTPs connected to the switch in hallb counting room, with that switch connected to computer center by 2x100GBit links
    6. Need to develop CONOPS with Streaming group (Abbott)
    7. December 2023 Testing Activity
    8. SRO RTDP LDRD - need configuration for Spring '24 test with EJFAT
    9. Ready to supply up to 200 Gbps to EJFAT switch
  9. Demo Ready EJFAT Instance - Lower priority task
  10. EJFAT Operational Status Board -> ESnet Prometheus Reporting - Help Desk Ticket placed for JLab FW issue
  11. XDP sockets working (ejfat-4) - 50% less cpu, 3500 MTU limit
  12. Mtg with FEG/SRO - need long term solution for event sync
  13. RTDP - wants to use LB - some plumbing work reqd - ticket to provide 100Gbps from Indra/DAQ Lab (?)
  14. EJFAT Reconfig Design / PR - Funds available through end of March.
    1. Rec'd 22 PCIe VPI NICS
    2. Rec'd 5 OCP3/SFF VPI NICS
    3. Surrendering indra-s2; moving U280 to cluster
  15. EJFAT Phase II
    1. Implementation details in the DAOS gateway.
      1. Need to spec DAOS Use Cases ?
      2. Intel standing up special slack channel to discuss DAOS
      3. Connection Strategy to DAOS
      4. Specially when to keep track of how the FPGA would DMA event data cells in the future if it was a SmartNIC card. ( Cissie )
      5. daosfs01​ has 2 physical IB cards and can run 2 true engines with each CPU socket hosting one engine.
    2. Progress of multi FPGA and multi virtual LB control plane sw. ( Derek )
    3. Progress of FPGA architecture ( Peter and Jonathan )
      1. LB FW currently limited to 100 Gbps
      2. Reassembly work commencing soon
    4. Progress of finalizing a reassembly frame format (subordinate to 4.) ( Carl / Stacey )
    5. Progress on software development for NVIDIA Bluefield2 DPU data steering from NIC to GPU memory ( Amitoj/Cissie )
  16. Resources:
    1. HPDF
  17. AOT