Difference between revisions of "EJFAT Group Meeting May. 12, 2022"

From epsciwiki
Jump to navigation Jump to search
Line 43: Line 43:
 
** ERSAP feed end bottleneck needs investigation; Timmer's blaster may provide relief
 
** ERSAP feed end bottleneck needs investigation; Timmer's blaster may provide relief
 
** Spare EJFAT equip loaners:
 
** Spare EJFAT equip loaners:
**** (4) DAQ dev machines ''indra-s[1-3]'' 129.57.29/109.23[0-2]-  
+
*** (4) DAQ dev machines ''indra-s[1-3]'' 129.57.29/109.23[0-2]-  
***** ''alkaid'': 24 Xeon Gold 3.4 GHz cores, 100Gbs
+
**** ''alkaid'': 24 Xeon Gold 3.4 GHz cores, 100Gbs
***** ''indra-s1'': 24 Xeon Gold 3.0 GHz cores, 100Gbs
+
**** ''indra-s1'': 24 Xeon Gold 3.0 GHz cores, 100Gbs
***** ''indra-s2'': 32 Xeon Gold 3.2 GHz cores, 100Gbs
+
**** ''indra-s2'': 32 Xeon Gold 3.2 GHz cores, 100Gbs
***** ''indra-s3'': 32 Xeon Gold 2.3 GHz cores, 100Gbs, 750GB ram disk
+
**** ''indra-s3'': 32 Xeon Gold 2.3 GHz cores, 100Gbs, 750GB ram disk
**** (4) DAQ Farm machines ''dafarm6[1-4]'' currently on 129.57.29.17[1-4] - each 32 Xeon 2.0Ghz cores - 1 Gbs NIC
+
*** (4) DAQ Farm machines ''dafarm6[1-4]'' currently on 129.57.29.17[1-4] - each 32 Xeon 2.0Ghz cores - 1 Gbs NIC
**** (17) Hall-D machines - ''gluon120-36'' 129.57.172.1[20-36]
+
*** (17) Hall-D machines - ''gluon120-36'' 129.57.52.9[2-36] - each 2 Xeon 2.6Ghz cores - 10Gbs NIC
**** (4) 10Gbs NICs
+
*** (4) 10Gbs Spare NICs
 
*** On Order:
 
*** On Order:
 
**** [https://misportal.jlab.org/reqs/pr/viewPr.do?prNum=408549 PR408549] (6) 100Gbs NICs - ETA 1 July
 
**** [https://misportal.jlab.org/reqs/pr/viewPr.do?prNum=408549 PR408549] (6) 100Gbs NICs - ETA 1 July
**** [https://misportal.jlab.org/reqs/pr/viewPr.do?prNum=408870 PR408870] [https://misportal.jlab.org/reqs/pr/viewPr.do?prNum=408938 PR408938] (2) 100Gbs Arista switches, transceivers, cables, etc - ETA <s>1 July</s> 5 October
+
**** [https://misportal.jlab.org/reqs/pr/viewPr.do?prNum=408870 PR408870] [https://misportal.jlab.org/reqs/pr/viewPr.do?prNum=408938 PR408938] (2) 100Gbs Arista switches, <s>transceivers, cables</s>, etc - ETA <s>1 July</s> 5 October
** LAN Speed Tests (iperf3 to indra-s2)
+
** Look at iperf2 for network testing
*** indra-s3 - 30 Gbs
+
** Look at [https://support.mellanox.com/s/article/roce-v2-considerations ROCE] / NIC
*** indra-s1 - 20 Gbs
 
*** alkaid  - 17 Gbs
 
*** indra-s2 - 10 Gbs
 
*** Look at iperf2 for network testing
 
*** Look at [https://support.mellanox.com/s/article/roce-v2-considerations ROCE] / NIC
 
*** Hall B VTPs on .167. subnet
 
 
* Pending:
 
* Pending:
 
** Support C libraries for LB Host Control Plane - in <s>unit test</s> code review
 
** Support C libraries for LB Host Control Plane - in <s>unit test</s> code review
Line 75: Line 69:
 
**** [https://jeffersonlab-my.sharepoint.com/:p:/r/personal/goodrich_jlab_org/Documents/EJFAT/hall-b_test.pptx?d=w31891fd52c1a420ea2b29efcdf5f9ed2&csf=1&web=1&e=JGyxHO Diagram]
 
**** [https://jeffersonlab-my.sharepoint.com/:p:/r/personal/goodrich_jlab_org/Documents/EJFAT/hall-b_test.pptx?d=w31891fd52c1a420ea2b29efcdf5f9ed2&csf=1&web=1&e=JGyxHO Diagram]
 
**** Hall-B to start taking data June 8
 
**** Hall-B to start taking data June 8
 +
**** Hall B VTPs on .167. subnet
 
** Downstream (June/July):
 
** Downstream (June/July):
 
*** [https://www.epj-conferences.org/articles/epjconf/abs/2021/05/epjconf_chep2021_04005/epjconf_chep2021_04005.html HOSS] - June 1
 
*** [https://www.epj-conferences.org/articles/epjconf/abs/2021/05/epjconf_chep2021_04005/epjconf_chep2021_04005.html HOSS] - June 1

Revision as of 17:39, 20 May 2022

The meeting time is 11:00am.

Connection Info:

You can connect using ZoomGov Video conferencing (ID: 161 012 5238). (Click "Expand" to the right for details -->):

Meeting URL
 https://jlab-org.zoomgov.com/j/1610125238?pwd=QnEvcjV6VFFndWZsQW15SmJKU0RJZz09&from=addon

Meeting ID
161 012 5238

Passcode
503371

Want to dial in from a phone?

Dial one of the following numbers:
US: +1 669 254 5252 or +1 646 828 7666 or +1 551 285 1373 or +1 669 216 1590 or 833 568 8864 (Toll Free)

Enter the meeting ID and passcode followed by #

Connecting from a room system?
Dial: bjn.vc or 199.48.152.152 and enter your meeting ID & passcode

Agenda:

  • Previous meeting
  • Situation:
    • Rec'd new f/w build 28 April
      • Specs
      • Restores Jumbo Frames
      • arp, ping - working
      • Port entropy field - Passed Test for data_id stream horizontal reassembly with 10 streams
    • Using script based LB Control Plane
    • ERSAP feed end bottleneck needs investigation; Timmer's blaster may provide relief
    • Spare EJFAT equip loaners:
      • (4) DAQ dev machines indra-s[1-3] 129.57.29/109.23[0-2]-
        • alkaid: 24 Xeon Gold 3.4 GHz cores, 100Gbs
        • indra-s1: 24 Xeon Gold 3.0 GHz cores, 100Gbs
        • indra-s2: 32 Xeon Gold 3.2 GHz cores, 100Gbs
        • indra-s3: 32 Xeon Gold 2.3 GHz cores, 100Gbs, 750GB ram disk
      • (4) DAQ Farm machines dafarm6[1-4] currently on 129.57.29.17[1-4] - each 32 Xeon 2.0Ghz cores - 1 Gbs NIC
      • (17) Hall-D machines - gluon120-36 129.57.52.9[2-36] - each 2 Xeon 2.6Ghz cores - 10Gbs NIC
      • (4) 10Gbs Spare NICs
      • On Order:
        • PR408549 (6) 100Gbs NICs - ETA 1 July
        • PR408870 PR408938 (2) 100Gbs Arista switches, transceivers, cables, etc - ETA 1 July 5 October
    • Look at iperf2 for network testing
    • Look at ROCE / NIC
  • Pending:
    • Support C libraries for LB Host Control Plane - in unit test code review
    • ESnet smartnic open-source GitHub repo (May)
    • ESnet private, forkable Jlab P4 and simulations GitHub repo (May)
  • To Do:
    • Near Term (May):
      • Hall-B FT calorimeter and hodoscope streaming readout test - Pending OK from Sergey B.
        • May be able to use Abbott's indra-s1 setup
        • May be able to use new VTP f/w with Hall-B VTP's - (Ben Raydo)
        • CODA 3.10 + ERSAP for new VTP f/w
        • CODA 2.0 (non-streaming) for old VTP f/w
        • Diagram
        • Hall-B to start taking data June 8
        • Hall B VTPs on .167. subnet
    • Downstream (June/July):
      • HOSS - June 1
        • parallelize writing of raw data files
        • distribute raw data across multiple compute nodes for calibration skims
        • 1 Gbs at hi-luminosity
        • Control Plane
          • Will interact with SLURM / Kubernetes
          • Python based (?)
          • Control Plane daemon for compute host (?)
          • Demonstrate CP based flexibility/elasticity
        • Hall-D comms with DAQ 109 subnet require network customization; (EJFAT subnet)
        • Hall-D EJFAT use case
        • Hall-D EJFAT Network Diagram
        • Configuration:
          • ejfat-sw 100Gbs switch
          • (6) PR408549 New Computers w/ X CPUs + U280 fpga
          • (?) Retired Data Center Farm Nodes
          • EJFAT subnet VLAN 937 172.19.22.0/24 - 100Gbs, Jumbo frames
      • DPDK
      • IPV6 testing
      • RT2022 - August 01-05 Conference
  • AOT