Difference between revisions of "EJFAT Group Meeting Nov. 16, 2023"
Jump to navigation
Jump to search
(Created page with "The meeting time is 11:00am Eastern/USA. === Connection Info: === <div class="toccolours mw-collapsible mw-collapsed"> You can connect using [ https://jlab-org.zoomgov.com/j/...") |
|||
(One intermediate revision by the same user not shown) | |||
Line 59: | Line 59: | ||
## Data Compressibility Studies using Hall B/D sample data | ## Data Compressibility Studies using Hall B/D sample data | ||
## '''Ready to supply up to 200 Gbps to EJFAT switch''' | ## '''Ready to supply up to 200 Gbps to EJFAT switch''' | ||
− | # | + | # Demo Ready EJFAT Instance |
− | # | + | ## Live Session? |
− | # | + | ### Check with Wenji Wu on demo format at SC23 (Amitoj) |
+ | ## Emulator ? | ||
+ | ## '''Recorded Session''' | ||
+ | # EJFAT Operational Status Board -> '''Prometheus Reporting''' | ||
# EJFAT Phase II | # EJFAT Phase II | ||
## Implementation details in the DAOS gateway. | ## Implementation details in the DAOS gateway. | ||
### Specially when to keep track of how the FPGA would DMA event data cells in the future if it was a SmartNIC card. ( Cissie ) | ### Specially when to keep track of how the FPGA would DMA event data cells in the future if it was a SmartNIC card. ( Cissie ) | ||
### Connection Strategy to DAOS - '''Infiniband - no room for additional PCIe cards in ejfat machines''' | ### Connection Strategy to DAOS - '''Infiniband - no room for additional PCIe cards in ejfat machines''' | ||
+ | ### "We just '''pulled the ConnectX-6 NIC of daosfs02''' out and insert it to daosfs01 and rearrange the NVMe SSD slots, so now daosfs01 has 2 physical IB cards and can run 2 true engines with each CPU socket hosting one engine. (Amitoj)" | ||
## Flow Control | ## Flow Control | ||
## Progress of multi FPGA and multi virtual LB control plane sw. ( Derek ) '''currently:''' small features like authentication etc.. | ## Progress of multi FPGA and multi virtual LB control plane sw. ( Derek ) '''currently:''' small features like authentication etc.. | ||
Line 77: | Line 81: | ||
### In a pinch one can use the 2x A100 GPUs in the NVIDIA Bluefield2 DPU server (hostname: nvidarm) | ### In a pinch one can use the 2x A100 GPUs in the NVIDIA Bluefield2 DPU server (hostname: nvidarm) | ||
### Initially purchase one of each GPU: NVIDIA, INTEL, AMD so we can compare performance across all 3 flavors. | ### Initially purchase one of each GPU: NVIDIA, INTEL, AMD so we can compare performance across all 3 flavors. | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
# Resources: | # Resources: | ||
## [https://jeffersonlab.sharepoint.com/sites/HPDF HPDF] | ## [https://jeffersonlab.sharepoint.com/sites/HPDF HPDF] | ||
# AOT | # AOT | ||
<hr> | <hr> |
Latest revision as of 15:28, 16 November 2023
The meeting time is 11:00am Eastern/USA.
Connection Info:
You can connect using [ https://jlab-org.zoomgov.com/j/1611828967?pwd=UVVCS0pUVW5FMlphT0lRQXdoQ0o4Zz09&from=addon ZoomGov Video conferencing (ID: 161 012 5238)]. (Click "Expand" to the right for details -->):
Meeting URL https://jlab-org.zoomgov.com/j/1611828967 Meeting ID 161 182 8967 Passcode 570041 Want to dial in from a phone? Dial one of the following numbers: US: +1 669 254 5252 or +1 646 828 7666 or +1 551 285 1373 or +1 669 216 1590 or 833 568 8864 (Toll Free) Enter the meeting ID and passcode followed by # Connecting from a room system? Dial: bjn.vc or 199.48.152.152 and enter your meeting ID & passcode
Agenda:
- Previous meeting
- Announcements:
- NERSC Test Development:
- Data Source:
- JLAB, CLAS12, pre-triggered events - 1 channel
- Front End Packetizer pending mods for Tick-sync msg to CP - UDP packet to port on CP Host
- Data Sink:
- Perlmutter
- ERSAP
- Networking for Test
- Currently 2 x 10 Gbps for JLab/L3 VPN - will expand to 200Gbps in near fututure - LB FW capable of this?
- JLab Preps
- Standing up second JLab LB instance
- Currently debugging test-harness set-up at NESRC/Perlmutter
- Test Plans - JLab, ESnet, NERSC:
- Data Source:
- Test with Oak Ridge (similar to NERSC) - Shankar, Mallikarjun (Arjun) <shankarm@ornl.gov>
- Hall B CLAS12 detector streaming test
- Switch 7050 is expected to arrive some time around October; we have already transceivers, short cables and patch panel to connect up to 32 VTPs to it using two 10GBit links per VTP
- Fiber installation between hallb forward carriage and hallb counting room should be done this summer, will be enough for 24 VTPs using two 10GBit links per VTP
- We have only one fiber between hallb counting room and counting house second floor available right now, will order more fibers installation, may take several months
- There are several available fibers between counting house second floor and computer center (like 6), we can use a couple of them for our test
- Summary: sometime in October, we should have 48 10GBit links from 24 VTPs connected to the switch in hallb counting room, with that switch connected to computer center by 2x100GBit links
- Need to develop CONOPS with Streaming group (Abbott)
- SRO RTDP LDRD - need configuration for Spring '24 test with EJFAT
- Data Compressibility Studies using Hall B/D sample data
- Ready to supply up to 200 Gbps to EJFAT switch
- Demo Ready EJFAT Instance
- Live Session?
- Check with Wenji Wu on demo format at SC23 (Amitoj)
- Emulator ?
- Recorded Session
- Live Session?
- EJFAT Operational Status Board -> Prometheus Reporting
- EJFAT Phase II
- Implementation details in the DAOS gateway.
- Specially when to keep track of how the FPGA would DMA event data cells in the future if it was a SmartNIC card. ( Cissie )
- Connection Strategy to DAOS - Infiniband - no room for additional PCIe cards in ejfat machines
- "We just pulled the ConnectX-6 NIC of daosfs02 out and insert it to daosfs01 and rearrange the NVMe SSD slots, so now daosfs01 has 2 physical IB cards and can run 2 true engines with each CPU socket hosting one engine. (Amitoj)"
- Flow Control
- Progress of multi FPGA and multi virtual LB control plane sw. ( Derek ) currently: small features like authentication etc..
- Progress of FPGA architecture ( Peter and Jonathan )
- Progress of finalizing a reassembly frame format (subordinate to 4.) ( Carl / Stacey )
- Progress on software development for NVIDIA Bluefield2 DPU data steering from NIC to GPU memory ( Amitoj/Cissie )
- Progress on DAOS file-server OS and filesystem installation ( Amitoj/Cissie )
- Install Infiniband NIC in EJFAT servers to connect to DAOS Infiniband Switch.
- Need to check available open PCIe slots on existing EJFAT servers (Michael)
- Install Infiniband NIC in EJFAT servers to connect to DAOS Infiniband Switch.
- GPU purchase for EJFAT Test stand servers under IRIAD funds. The servers are capable of hosting 2 GPUs per server. ( Amitoj )
- In a pinch one can use the 2x A100 GPUs in the NVIDIA Bluefield2 DPU server (hostname: nvidarm)
- Initially purchase one of each GPU: NVIDIA, INTEL, AMD so we can compare performance across all 3 flavors.
- Implementation details in the DAOS gateway.
- Resources:
- AOT