Difference between revisions of "EJFAT EPSCI Meeting Jul. 6, 2022"

From epsciwiki
Jump to navigation Jump to search
(Created page with "The meeting time is 2:00pm. === Connection Info: === <div class="toccolours mw-collapsible mw-collapsed"> You can connect using [https://jlab-org.zoomgov.com/j/1612038101?pwd...")
 
 
Line 31: Line 31:
 
<!-------------------------------------------------------------------------------------------------->
 
<!-------------------------------------------------------------------------------------------------->
  
 +
=== Agenda: ===
 +
* [[EJFAT EPSCI Meeting Jun. 22, 2022 | Previous meeting]]
 
=== Agenda: ===
 
=== Agenda: ===
 
* [[EJFAT EPSCI Meeting Jun. 22, 2022 | Previous meeting]]
 
* [[EJFAT EPSCI Meeting Jun. 22, 2022 | Previous meeting]]
Line 45: Line 47:
 
*** ESnet smartnic open-source GitHub repo - in legal review
 
*** ESnet smartnic open-source GitHub repo - in legal review
 
*** ESnet private, forkable Jlab P4 and simulations GitHub repo - in legal review
 
*** ESnet private, forkable Jlab P4 and simulations GitHub repo - in legal review
** ERSAP feed end bottleneck needs investigation; Timmer/Gyurjyan investigating
+
*** FPGA LB data generation capability
 +
** ERSAP feed end bottleneck needs investigation; '''Timmer/Gyurjyan investigating'''
 
** New machines (7) rec'd, installed w/ Ubuntu 20.04 on EJFAT subnet (VLAN 937 172.19.22.0/24)
 
** New machines (7) rec'd, installed w/ Ubuntu 20.04 on EJFAT subnet (VLAN 937 172.19.22.0/24)
 
** Spare EJFAT equip loaners:
 
** Spare EJFAT equip loaners:
Line 55: Line 58:
 
*** (4) DAQ Farm machines ''dafarm6[1-4]'' currently on 129.57.29.17[1-4] - each 32 Xeon 2.0Ghz cores - 1 Gbs NIC + (4) 10Gbs Spare NICs
 
*** (4) DAQ Farm machines ''dafarm6[1-4]'' currently on 129.57.29.17[1-4] - each 32 Xeon 2.0Ghz cores - 1 Gbs NIC + (4) 10Gbs Spare NICs
 
*** (4) Unbuilt DAQ Farm machines - each 32 Xeon 2.0Ghz cores - 1 Gbs NIC + (4) 10Gbs Spare NICs
 
*** (4) Unbuilt DAQ Farm machines - each 32 Xeon 2.0Ghz cores - 1 Gbs NIC + (4) 10Gbs Spare NICs
*** [https://misportal.jlab.org/reqs/pr/viewPr.do?prNum=408549 PR408549] (7) 100Gbs NICs - Shipped
+
*** [https://misportal.jlab.org/reqs/pr/viewPr.do?prNum=408549 PR408549] (7) 100Gbs NICs - '''Shipped'''
*** [https://misportal.jlab.org/reqs/pr/viewPr.do?prNum=408870 PR408870] [https://misportal.jlab.org/reqs/pr/viewPr.do?prNum=408938 PR408938] (2) 100Gbs Arista switches, <s>transceivers, cables</s>, etc - ETA <s>1 July</s> 5 October
+
*** [https://misportal.jlab.org/reqs/pr/viewPr.do?prNum=408870 PR408870] [https://misportal.jlab.org/reqs/pr/viewPr.do?prNum=408938 PR408938] (2) 100Gbs Arista switches, <s>transceivers, cables</s>, etc - ETA <s>1 July</s> 5 '''October'''
 
* Next Steps:
 
* Next Steps:
 
** EJFAT VLAN Checkout
 
** EJFAT VLAN Checkout
 
** Network Performance:
 
** Network Performance:
*** FPGA LB Throughput - max sustained 90Gbs
+
*** FPGA LB Throughput - max sustained '''90Gbs'''
*** '''Packet Loss Studies'''
+
*** '''Packet Loss Studies''' / '''LB Round Robin Studies'''
*** '''LB Round Robin Studies'''
+
*** '''DAQ/VTP Data Generation Test Harness - 40 Gbs initially'''
 
*** Host NICs
 
*** Host NICs
 
*** Host S/W Reassembly - better algorithms, buffering, asynchronicity, etc.
 
*** Host S/W Reassembly - better algorithms, buffering, asynchronicity, etc.
Line 77: Line 80:
 
*** Demonstrate CP based flexibility/elasticity
 
*** Demonstrate CP based flexibility/elasticity
 
*** SLURM env for EJFAT VLAN (Hess)
 
*** SLURM env for EJFAT VLAN (Hess)
** '''DAQ/VTP Data Generation Test Harness'''
+
** Vivado Licesnses for new machines - AI/ML ? '''D. Lawrence investigating'''
** Vivado Licesnses for new machines - AI/ML ? D. Lawrence investigating
 
 
** [https://github.com/Xilinx/open-nic-shell Open Nic Shell]
 
** [https://github.com/Xilinx/open-nic-shell Open Nic Shell]
 
** ACAT 2022 - September/Italy - Abstract '''(July 14)''' / '''Paper'''
 
** ACAT 2022 - September/Italy - Abstract '''(July 14)''' / '''Paper'''
Line 84: Line 86:
 
** RT 2022 '''Paper''' / '''Presentation'''
 
** RT 2022 '''Paper''' / '''Presentation'''
 
* Back Burner / Downstream:
 
* Back Burner / Downstream:
** FPGA LB data generation capability
 
 
** Hall-B FT calorimeter and hodoscope streaming readout test
 
** Hall-B FT calorimeter and hodoscope streaming readout test
 
*** May be able to use Abbott's indra-s1 setup
 
*** May be able to use Abbott's indra-s1 setup

Latest revision as of 17:18, 6 July 2022

The meeting time is 2:00pm.

Connection Info:

You can connect using ZoomGov Video conferencing (ID: 161 203 8101). (Click "Expand" to the right for details -->):

Meeting URL
	https://jlab-org.zoomgov.com/j/1617413961?pwd=QWpXalc0SXFrSUNBNmFrbVZycisrUT09&from=addon

Meeting ID
161 741 3961

Passcode
124964

Want to dial in from a phone?

Dial one of the following numbers:
US: +1 669 254 5252 or +1 646 828 7666 or +1 551 285 1373 or +1 669 216 1590 or 833 568 8864 (Toll Free)

Enter the meeting ID and passcode followed by #

Connecting from a room system?
Dial: bjn.vc or 199.48.152.152 and enter your meeting ID & passcode


Agenda:

Agenda:

  • Previous meeting
  • Status:
    • Using ESnet FPGA f/w build 28 April
      • Specs
      • Jumbo Frames
      • arp, ping, ICMP filtering
      • Port entropy
    • Script based LB Control Plane
    • In ESnet Legal Review
      • Support C libraries for LB Host Control Plane
      • ESnet smartnic open-source GitHub repo - in legal review
      • ESnet private, forkable Jlab P4 and simulations GitHub repo - in legal review
      • FPGA LB data generation capability
    • ERSAP feed end bottleneck needs investigation; Timmer/Gyurjyan investigating
    • New machines (7) rec'd, installed w/ Ubuntu 20.04 on EJFAT subnet (VLAN 937 172.19.22.0/24)
    • Spare EJFAT equip loaners:
      • (4) DAQ dev machines indra-s[1-3] 129.57.29/109.23[0-2]
        • alkaid: 24 Xeon Gold 3.4 GHz cores, 100Gbs
        • indra-s1: 24 Xeon Gold 3.0 GHz cores, 100Gbs
        • indra-s2: 32 Xeon Gold 3.2 GHz cores, 100Gbs
        • indra-s3: 32 Xeon Gold 2.3 GHz cores, 100Gbs, 750GB ram disk
      • (4) DAQ Farm machines dafarm6[1-4] currently on 129.57.29.17[1-4] - each 32 Xeon 2.0Ghz cores - 1 Gbs NIC + (4) 10Gbs Spare NICs
      • (4) Unbuilt DAQ Farm machines - each 32 Xeon 2.0Ghz cores - 1 Gbs NIC + (4) 10Gbs Spare NICs
      • PR408549 (7) 100Gbs NICs - Shipped
      • PR408870 PR408938 (2) 100Gbs Arista switches, transceivers, cables, etc - ETA 1 July 5 October
  • Next Steps:
    • EJFAT VLAN Checkout
    • Network Performance:
      • FPGA LB Throughput - max sustained 90Gbs
      • Packet Loss Studies / LB Round Robin Studies
      • DAQ/VTP Data Generation Test Harness - 40 Gbs initially
      • Host NICs
      • Host S/W Reassembly - better algorithms, buffering, asynchronicity, etc.
      • EJFAT UDP Transmission Performance
      • Need better parameters for event reassembly/reconstruction
      • DPDK will own NIC bypassing kernel, etc. - ESnet reports can stream 100 Gbps using DPDK; Pktgen-DPDK
      • Pktgen-DPDK
      • Look at ROCE / NIC
    • ESnet UDP tuning
    • Control Plane
      • Will interact with SLURM / Kubernetes
      • Python based (?)
      • Control Plane daemon for compute host (?)
      • Demonstrate CP based flexibility/elasticity
      • SLURM env for EJFAT VLAN (Hess)
    • Vivado Licesnses for new machines - AI/ML ? D. Lawrence investigating
    • Open Nic Shell
    • ACAT 2022 - September/Italy - Abstract (July 14) / Paper
    • RT2022 - August 01-05 Conference
    • RT 2022 Paper / Presentation
  • Back Burner / Downstream:
    • Hall-B FT calorimeter and hodoscope streaming readout test
      • May be able to use Abbott's indra-s1 setup
      • May be able to use new VTP f/w with Hall-B VTP's
      • CODA 3.10 + ERSAP for new VTP f/w
      • CODA 2.0 (non-streaming) for old VTP f/w
      • Diagram
      • Hall-B to start taking data June 8
      • Hall B VTPs on .167. subnet
    • HOSS
      • parallelize writing of raw data files
      • distribute raw data across multiple compute nodes for calibration skims
      • 1 Gbs at hi-luminosity
      • Hall-D comms with DAQ 109 subnet require network customization; (EJFAT subnet)
      • Hall-D EJFAT use case
      • Hall-D EJFAT Network Diagram
    • IPV6 testing
  • AOT

Notes:

  • numa tools on ubuntu:
    • sudo apt install hwloc-nox
    • sudo apt install numactl
  • lstopo
  • numactl --hardware
  • To control the scheduling class, you can use the chrt command.
  • To pin to CPUs, use the taskset command. Or use the underlying syscalls.
  • kernel dameon threads
    • handle NIC driver interrupts
    • set scheduling class to SCHED_FIFO / SCHED_RR of reassembly process
  • want to set cpu socket of reassembly in common NUMA domain as NIC