Difference between revisions of "EJFAT Group Meeting May. 12, 2022"
Jump to navigation
Jump to search
(14 intermediate revisions by the same user not shown) | |||
Line 43: | Line 43: | ||
** ERSAP feed end bottleneck needs investigation; Timmer's blaster may provide relief | ** ERSAP feed end bottleneck needs investigation; Timmer's blaster may provide relief | ||
** Spare EJFAT equip loaners: | ** Spare EJFAT equip loaners: | ||
− | + | *** (4) DAQ dev machines ''indra-s[1-3]'' 129.57.29/109.23[0-2] | |
− | **** | + | **** ''alkaid'': 24 Xeon Gold 3.4 GHz cores, 100Gbs |
− | + | **** ''indra-s1'': 24 Xeon Gold 3.0 GHz cores, 100Gbs | |
− | + | **** ''indra-s2'': 32 Xeon Gold 3.2 GHz cores, 100Gbs | |
− | + | **** ''indra-s3'': 32 Xeon Gold 2.3 GHz cores, 100Gbs, 750GB ram disk | |
+ | *** (4) DAQ Farm machines ''dafarm6[1-4]'' currently on 129.57.29.17[1-4] - each 32 Xeon 2.0Ghz cores - 1 Gbs NIC | ||
+ | *** (17) Hall-D machines - ''gluon120-36'' 129.57.52.9[2-36] - each 2 Xeon 2.6Ghz cores - 10Gbs NIC | ||
+ | *** (4) 10Gbs Spare NICs | ||
*** On Order: | *** On Order: | ||
− | **** (6) 100Gbs NICs - ETA 1 July | + | **** [https://misportal.jlab.org/reqs/pr/viewPr.do?prNum=408549 PR408549] (6) 100Gbs NICs - ETA 1 July |
− | **** | + | **** [https://misportal.jlab.org/reqs/pr/viewPr.do?prNum=408870 PR408870] [https://misportal.jlab.org/reqs/pr/viewPr.do?prNum=408938 PR408938] (2) 100Gbs Arista switches, <s>transceivers, cables</s>, etc - ETA <s>1 July</s> 5 October |
− | + | ** Look at iperf2 for network testing | |
− | + | ** Look at [https://support.mellanox.com/s/article/roce-v2-considerations ROCE] / NIC | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
* Pending: | * Pending: | ||
** Support C libraries for LB Host Control Plane - in <s>unit test</s> code review | ** Support C libraries for LB Host Control Plane - in <s>unit test</s> code review | ||
Line 71: | Line 68: | ||
**** CODA 2.0 (non-streaming) for old VTP f/w | **** CODA 2.0 (non-streaming) for old VTP f/w | ||
**** [https://jeffersonlab-my.sharepoint.com/:p:/r/personal/goodrich_jlab_org/Documents/EJFAT/hall-b_test.pptx?d=w31891fd52c1a420ea2b29efcdf5f9ed2&csf=1&web=1&e=JGyxHO Diagram] | **** [https://jeffersonlab-my.sharepoint.com/:p:/r/personal/goodrich_jlab_org/Documents/EJFAT/hall-b_test.pptx?d=w31891fd52c1a420ea2b29efcdf5f9ed2&csf=1&web=1&e=JGyxHO Diagram] | ||
+ | **** Hall-B to start taking data June 8 | ||
+ | **** Hall B VTPs on .167. subnet | ||
** Downstream (June/July): | ** Downstream (June/July): | ||
*** [https://www.epj-conferences.org/articles/epjconf/abs/2021/05/epjconf_chep2021_04005/epjconf_chep2021_04005.html HOSS] - June 1 | *** [https://www.epj-conferences.org/articles/epjconf/abs/2021/05/epjconf_chep2021_04005/epjconf_chep2021_04005.html HOSS] - June 1 | ||
Line 77: | Line 76: | ||
**** 1 Gbs at hi-luminosity | **** 1 Gbs at hi-luminosity | ||
**** Control Plane | **** Control Plane | ||
− | ***** Will interact with SLURM | + | ***** Will interact with SLURM / Kubernetes |
***** Python based (?) | ***** Python based (?) | ||
***** Control Plane daemon for compute host (?) | ***** Control Plane daemon for compute host (?) | ||
***** Demonstrate CP based flexibility/elasticity | ***** Demonstrate CP based flexibility/elasticity | ||
− | **** Hall-D comms with DAQ 109 subnet require network customization; | + | **** Hall-D comms with DAQ 109 subnet require network customization; (EJFAT subnet) |
**** [https://docs.google.com/presentation/d/1m3rFm-1GymYv8zGimlAjL1NmWtXVfyIQdGzhx_j_BKE/edit?usp=sharing Hall-D EJFAT use case] | **** [https://docs.google.com/presentation/d/1m3rFm-1GymYv8zGimlAjL1NmWtXVfyIQdGzhx_j_BKE/edit?usp=sharing Hall-D EJFAT use case] | ||
**** [https://jeffersonlab-my.sharepoint.com/personal/bmorris_jlab_org/Documents/Microsoft%20Teams%20Chat%20Files/JLab%20Network%20-%20HallD-to-EJFAT.png Hall-D EJFAT Network Diagram] | **** [https://jeffersonlab-my.sharepoint.com/personal/bmorris_jlab_org/Documents/Microsoft%20Teams%20Chat%20Files/JLab%20Network%20-%20HallD-to-EJFAT.png Hall-D EJFAT Network Diagram] | ||
**** Configuration: | **** Configuration: | ||
− | ***** ''ejfat-sw'' 100Gbs | + | ***** ''ejfat-sw'' 100Gbs switch |
− | ***** (6) New Computers w/ X CPUs + U280 fpga | + | ***** (6) [https://misportal.jlab.org/reqs/pr/viewPr.do?prNum=408549 PR408549] New Computers w/ X CPUs + U280 fpga |
− | ***** (?) Retired Farm Nodes | + | ***** (?) Retired Data Center Farm Nodes |
− | ***** EJFAT subnet VLAN 937 172.19.22.0/24 | + | ***** EJFAT subnet VLAN 937 172.19.22.0/24 - 100Gbs, Jumbo frames |
*** [http://www.dpdk.org DPDK] | *** [http://www.dpdk.org DPDK] | ||
*** IPV6 testing | *** IPV6 testing |
Latest revision as of 17:14, 24 May 2022
The meeting time is 11:00am.
Connection Info:
You can connect using ZoomGov Video conferencing (ID: 161 012 5238). (Click "Expand" to the right for details -->):
Meeting URL https://jlab-org.zoomgov.com/j/1610125238?pwd=QnEvcjV6VFFndWZsQW15SmJKU0RJZz09&from=addon Meeting ID 161 012 5238 Passcode 503371 Want to dial in from a phone? Dial one of the following numbers: US: +1 669 254 5252 or +1 646 828 7666 or +1 551 285 1373 or +1 669 216 1590 or 833 568 8864 (Toll Free) Enter the meeting ID and passcode followed by # Connecting from a room system? Dial: bjn.vc or 199.48.152.152 and enter your meeting ID & passcode
Agenda:
- Previous meeting
- Situation:
- Rec'd new f/w build 28 April
- Specs
- Restores Jumbo Frames
- arp, ping - working
- Port entropy field - Passed Test for data_id stream horizontal reassembly with 10 streams
- Using script based LB Control Plane
- ERSAP feed end bottleneck needs investigation; Timmer's blaster may provide relief
- Spare EJFAT equip loaners:
- (4) DAQ dev machines indra-s[1-3] 129.57.29/109.23[0-2]
- alkaid: 24 Xeon Gold 3.4 GHz cores, 100Gbs
- indra-s1: 24 Xeon Gold 3.0 GHz cores, 100Gbs
- indra-s2: 32 Xeon Gold 3.2 GHz cores, 100Gbs
- indra-s3: 32 Xeon Gold 2.3 GHz cores, 100Gbs, 750GB ram disk
- (4) DAQ Farm machines dafarm6[1-4] currently on 129.57.29.17[1-4] - each 32 Xeon 2.0Ghz cores - 1 Gbs NIC
- (17) Hall-D machines - gluon120-36 129.57.52.9[2-36] - each 2 Xeon 2.6Ghz cores - 10Gbs NIC
- (4) 10Gbs Spare NICs
- On Order:
- (4) DAQ dev machines indra-s[1-3] 129.57.29/109.23[0-2]
- Look at iperf2 for network testing
- Look at ROCE / NIC
- Rec'd new f/w build 28 April
- Pending:
- Support C libraries for LB Host Control Plane - in
unit testcode review - ESnet smartnic open-source GitHub repo (May)
- ESnet private, forkable Jlab P4 and simulations GitHub repo (May)
- Support C libraries for LB Host Control Plane - in
- To Do:
- Near Term (May):
- Hall-B FT calorimeter and hodoscope streaming readout test - Pending OK from Sergey B.
- May be able to use Abbott's indra-s1 setup
- May be able to use new VTP f/w with Hall-B VTP's - (Ben Raydo)
- CODA 3.10 + ERSAP for new VTP f/w
- CODA 2.0 (non-streaming) for old VTP f/w
- Diagram
- Hall-B to start taking data June 8
- Hall B VTPs on .167. subnet
- Hall-B FT calorimeter and hodoscope streaming readout test - Pending OK from Sergey B.
- Downstream (June/July):
- HOSS - June 1
- parallelize writing of raw data files
- distribute raw data across multiple compute nodes for calibration skims
- 1 Gbs at hi-luminosity
- Control Plane
- Will interact with SLURM / Kubernetes
- Python based (?)
- Control Plane daemon for compute host (?)
- Demonstrate CP based flexibility/elasticity
- Hall-D comms with DAQ 109 subnet require network customization; (EJFAT subnet)
- Hall-D EJFAT use case
- Hall-D EJFAT Network Diagram
- Configuration:
- ejfat-sw 100Gbs switch
- (6) PR408549 New Computers w/ X CPUs + U280 fpga
- (?) Retired Data Center Farm Nodes
- EJFAT subnet VLAN 937 172.19.22.0/24 - 100Gbs, Jumbo frames
- DPDK
- IPV6 testing
- RT2022 - August 01-05 Conference
- HOSS - June 1
- Near Term (May):
- AOT