EJFAT Group Meeting Mar. 28, 2024
Jump to navigation
Jump to search
The meeting time is 11:00am Eastern/USA.
Connection Info:
You can connect using [ https://jlab-org.zoomgov.com/j/1611828967?pwd=UVVCS0pUVW5FMlphT0lRQXdoQ0o4Zz09&from=addon ZoomGov Video conferencing (ID: 161 012 5238)]. (Click "Expand" to the right for details -->):
Meeting URL https://jlab-org.zoomgov.com/j/1611828967 Meeting ID 161 182 8967 Passcode 570041 Want to dial in from a phone? Dial one of the following numbers: US: +1 669 254 5252 or +1 646 828 7666 or +1 551 285 1373 or +1 669 216 1590 or 833 568 8864 (Toll Free) Enter the meeting ID and passcode followed by # Connecting from a room system? Dial: bjn.vc or 199.48.152.152 and enter your meeting ID & passcode
Agenda:
- Previous meeting
- Announcements:
- 24th Real Time Conference – ICISE, Quy Nhon, Vietnam 22-26 April 2024 - Selected for 20 min Oral
- Show how EJFAT architecture contrasts with current
- Explain EJFAT's benefits to experimentalist
- Propose to measure/plot EJFAT event delivery elasticity/scaling through reassembly/reconstruction steps
- Use JLab-ESnet-NERSC/ORNL/JLab environment
- Status:
- 100Gig traffic to ESnet LB, JLab Hall-B
- EJFAT LBs:
- JLab ejfat-2: current FW, not current tools; current production CP
- 192.188.29.16 (ESnet): current FW, current tools; current production CP
- JLab ejfat-5: current FW, current tools; current development CP
- All EJFAT nodes have 48GB ramdisk, (ejfat-fs = 96GB)
- Cluster Activities:
- ejfat-1: Test moving a current LB install wholesale by moving files/containers
- ejfat-2: June 2023 LB - open for business
- ejfat-3: Ready to accept second FPGA for experimentation/install by ESnet
- ejfat-4: Carl's XDP experiments
- ejfat-5: Feb 2024 LB - latest production
- ejfat-6: Cissie's DAOS experiments / Mar 2024 LB - latest development
- ejfat-fs: Hosts NVME memory/disk
- NERSC Test Development:
- Data Source:
- JLAB, CLAS12, pre-triggered events - 1 channel
- Data Sink:
- Perlmutter
- ERSAP
- Networking for Test
- Currently 2 x 10 Gbps for JLab/L3 VPN - will expand to 200Gbps in near fututure
- Test Plans - JLab, ESnet, NERSC:
- Data Source:
- ORNL/ESnet/JLab IRI Testbed (similar to NERSC) - Ross Miller <rgmiller@ornl.gov> project code CSC 266
- Hall B CLAS12 detector streaming test
- Switch 7050 is expected to arrive some time around October; we have already transceivers, short cables and patch panel to connect up to 32 VTPs to it using two 10GBit links per VTP
- Fiber installation between hallb forward carriage and hallb counting room should be done this summer, will be enough for 24 VTPs using two 10GBit links per VTP
- We have only one fiber between hallb counting room and counting house second floor available right now, will order more fibers installation, may take several months
- There are several available fibers between counting house second floor and computer center (like 6), we can use a couple of them for our test
- Summary: sometime in October, we should have 48 10GBit links from 24 VTPs connected to the switch in hallb counting room, with that switch connected to computer center by 2x100GBit links
- Need to develop CONOPS with Streaming group (Abbott)
- December 2023 Testing Activity
- SRO RTDP LDRD - need configuration for Spring '24 test with EJFAT
- Ready to supply up to 200 Gbps to EJFAT switch
- Demo Ready EJFAT Instance - Lower priority task
- EJFAT Operational Status Board -> ESnet Prometheus Reporting - Help Desk Ticket placed for JLab FW issue
- XDP sockets working (ejfat-4) - 50% less cpu, 3500 MTU limit
- Mtg with FEG/SRO - need long term solution for event sync
- RTDP - wants to use LB - some plumbing work reqd - ticket to provide 100Gbps from Indra/DAQ Lab (?)
- EJFAT Reconfig Design / PR - Funds available through end of March.
- Rec'd 22 PCIe VPI NICS
- Rec'd 5 OCP3/SFF VPI NICS
- Surrendering indra-s2; moving U280 to cluster
- EJFAT Phase II
- Implementation details in the DAOS gateway.
- Need to spec DAOS Use Cases ?
- Intel standing up special slack channel to discuss DAOS
- Connection Strategy to DAOS
- Specially when to keep track of how the FPGA would DMA event data cells in the future if it was a SmartNIC card. ( Cissie )
- daosfs01 has 2 physical IB cards and can run 2 true engines with each CPU socket hosting one engine.
- Progress of multi FPGA and multi virtual LB control plane sw. ( Derek )
- Progress of FPGA architecture ( Peter and Jonathan )
- LB FW currently limited to 100 Gbps
- Reassembly work commencing soon
- Progress of finalizing a reassembly frame format (subordinate to 4.) ( Carl / Stacey )
- Progress on software development for NVIDIA Bluefield2 DPU data steering from NIC to GPU memory ( Amitoj/Cissie )
- Implementation details in the DAOS gateway.
- ALS workflows: Harinarayan Krishnan hkrishnan@lbl.gov
- Resources:
- AOT