Difference between revisions of "EJFAT Group Meeting Feb. 29, 2024"

From epsciwiki
Jump to navigation Jump to search
(Created page with "The meeting time is 11:00am Eastern/USA. === Connection Info: === <div class="toccolours mw-collapsible mw-collapsed"> You can connect using [ https://jlab-org.zoomgov.com/j/...")
 
 
Line 42: Line 42:
 
### EJFAT configuration - CP current as of Jan 9 '''(ejfat-2 = Jun/2023, ejfat-5 = Feb/2024) in the works)'''
 
### EJFAT configuration - CP current as of Jan 9 '''(ejfat-2 = Jun/2023, ejfat-5 = Feb/2024) in the works)'''
 
## 24th IEEE Real Time Conference – Quy Nhon, Vietnam 22-26 April 2024 - [https://jeffersonlab.sharepoint.com/:b:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/rt2024_abstract.pdf?csf=1&web=1&e=urlJFV Abstract] '''Selected for 20 min Oral'''.
 
## 24th IEEE Real Time Conference – Quy Nhon, Vietnam 22-26 April 2024 - [https://jeffersonlab.sharepoint.com/:b:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/rt2024_abstract.pdf?csf=1&web=1&e=urlJFV Abstract] '''Selected for 20 min Oral'''.
## The first 100Gig circuit passed testing and ready for traffic - need to sync up with ESNet
+
## '''The first 100Gig circuit passed testing and ready for traffic - need to sync up with ESNet'''
 +
# '''Installation Issues for new LB instance on ejfat-5'''
 
# [https://docs.google.com/document/d/1CsBtDZEhK4k9POSeiLF4kzTQVMl7KwZtubVH-aEIyo4/edit NERSC Test Development]:  
 
# [https://docs.google.com/document/d/1CsBtDZEhK4k9POSeiLF4kzTQVMl7KwZtubVH-aEIyo4/edit NERSC Test Development]:  
 
## Data Source:  
 
## Data Source:  

Latest revision as of 14:26, 29 February 2024

The meeting time is 11:00am Eastern/USA.

Connection Info:

You can connect using [ https://jlab-org.zoomgov.com/j/1611828967?pwd=UVVCS0pUVW5FMlphT0lRQXdoQ0o4Zz09&from=addon ZoomGov Video conferencing (ID: 161 012 5238)]. (Click "Expand" to the right for details -->):

Meeting URL
 https://jlab-org.zoomgov.com/j/1611828967

Meeting ID
161 182 8967

Passcode
570041

Want to dial in from a phone?

Dial one of the following numbers:
US: +1 669 254 5252 or +1 646 828 7666 or +1 551 285 1373 or +1 669 216 1590 or 833 568 8864 (Toll Free)

Enter the meeting ID and passcode followed by #

Connecting from a room system?
Dial: bjn.vc or 199.48.152.152 and enter your meeting ID & passcode


Agenda:

  1. Previous meeting
  2. Announcements:
    1. 22nd ACAT ConferenceStony Brook University 11-15 March 2024 - Selected for 20 min Oral
      1. Contrast current EJFAT architecture event delivery with CLAS12 - David, Mike
      2. Propose to measure/plot EJFAT event delivery elasticity/scaling through reassembly step
      3. Treatments: Num data channels, XDP, Data Rate, Event synch latency
      4. Use JLab EJFAT environment (Jun/2023 release); NERSC (dev release)
      5. EJFAT configuration - CP current as of Jan 9 (ejfat-2 = Jun/2023, ejfat-5 = Feb/2024) in the works)
    2. 24th IEEE Real Time Conference – Quy Nhon, Vietnam 22-26 April 2024 - Abstract Selected for 20 min Oral.
    3. The first 100Gig circuit passed testing and ready for traffic - need to sync up with ESNet
  3. Installation Issues for new LB instance on ejfat-5
  4. NERSC Test Development:
    1. Data Source:
      1. JLAB, CLAS12, pre-triggered events - 1 channel
    2. Data Sink:
      1. Perlmutter
      2. ERSAP
    3. Networking for Test
      1. Currently 2 x 10 Gbps for JLab/L3 VPN - will expand to 200Gbps in near fututure
    4. Test Plans - JLab, ESnet, NERSC:
  5. ORNL/ESnet/JLab IRI Testbed (similar to NERSC) - Ross Miller <rgmiller@ornl.gov> project code CSC 266
  6. Hall B CLAS12 detector streaming test
    1. Switch 7050 is expected to arrive some time around October; we have already transceivers, short cables and patch panel to connect up to 32 VTPs to it using two 10GBit links per VTP
    2. Fiber installation between hallb forward carriage and hallb counting room should be done this summer, will be enough for 24 VTPs using two 10GBit links per VTP
    3. We have only one fiber between hallb counting room and counting house second floor available right now, will order more fibers installation, may take several months
    4. There are several available fibers between counting house second floor and computer center (like 6), we can use a couple of them for our test
    5. Summary: sometime in October, we should have 48 10GBit links from 24 VTPs connected to the switch in hallb counting room, with that switch connected to computer center by 2x100GBit links
    6. Need to develop CONOPS with Streaming group (Abbott)
    7. December 2023 Testing Activity
    8. SRO RTDP LDRD - need configuration for Spring '24 test with EJFAT
    9. Ready to supply up to 200 Gbps to EJFAT switch
  7. Demo Ready EJFAT Instance - Lower priority task
  8. EJFAT Operational Status Board -> ESnet Prometheus Reporting - Help Desk Ticket placed for JLab FW issue
  9. XDP sockets working (ejfat-4) - 50% less cpu, 3500 MTU limit
  10. Mtg with FEG/SRO - need long term solution for event sync
  11. RTDP update
  12. EJFAT Reconfig Design / PR - Funds available through end of March.
    1. Have ordered 22 PCIe VPI NICS
    2. Have ordered 5 OCP3/SFF VPI NICS
    3. Surrendering indra-s2; moving U280 to ejfat-3
  13. EJFAT Phase II
    1. Implementation details in the DAOS gateway.
      1. Need to spec DAOS Use Cases ?
      2. Intel standing up special slack channel to discuss DAOS
      3. Connection Strategy to DAOS
      4. Specially when to keep track of how the FPGA would DMA event data cells in the future if it was a SmartNIC card. ( Cissie )
      5. daosfs01​ has 2 physical IB cards and can run 2 true engines with each CPU socket hosting one engine.
    2. Progress of multi FPGA and multi virtual LB control plane sw. ( Derek )
    3. Progress of FPGA architecture ( Peter and Jonathan )
      1. LB FW currently limited to 100 Gbps
      2. Reassembly work commencing soon
    4. Progress of finalizing a reassembly frame format (subordinate to 4.) ( Carl / Stacey )
    5. Progress on software development for NVIDIA Bluefield2 DPU data steering from NIC to GPU memory ( Amitoj/Cissie )
    6. GPU purchase (A100) for EJFAT Test stand servers under IRIAD funds. The servers are capable of hosting 2 GPUs per server. ( Amitoj )
      1. In a pinch one can use the 2x A100 GPUs in the NVIDIA Bluefield2 DPU server (hostname: nvidarm)
      2. Initially purchase one of each GPU: NVIDIA, INTEL, AMD so we can compare performance across all 3 flavors.
      3. Tracking Code using GPUs from Available from Hall-B
  14. Resources:
    1. HPDF
  15. AOT