Difference between revisions of "EJFAT EPSCI Meeting Nov. 16, 2022"

From epsciwiki
Jump to navigation Jump to search
(Created page with "The meeting time is 2:00pm. === Connection Info: === <div class="toccolours mw-collapsible mw-collapsed"> You can connect using [https://jlab-org.zoomgov.com/j/1612038101?pwd...")
 
 
Line 32: Line 32:
  
 
=== Agenda: ===
 
=== Agenda: ===
* [[EJFAT EPSCI Meeting Oct. 19, 2022 | Previous meeting]]
+
* [[EJFAT EPSCI Meeting Nov. 09, 2022 | Previous meeting]]
 
*:
 
*:
 +
* Announcements:
 +
** '''Abstract Submission for CHEP 2023 (Nov 17)'''
 
* Status:
 
* Status:
 
** Using ESnet FPGA f/w build 28 April
 
** Using ESnet FPGA f/w build 28 April
Line 40: Line 42:
 
*** arp, ping, ICMP filtering
 
*** arp, ping, ICMP filtering
 
*** Port entropy
 
*** Port entropy
** IPV6 neighbor discovery
 
** LB F/W Installation Manual - with PCI buffer allocation assurance steps
 
 
** Script based LB Control Plane
 
** Script based LB Control Plane
** In ESnet Legal Review
 
*** Support C libraries for LB Host Control Plane
 
*** ESnet smartnic open-source GitHub repo - in legal review
 
*** ESnet private, forkable Jlab P4 and simulations GitHub repo - in legal review
 
*** FPGA LB data generation capability
 
** ERSAP feed end bottleneck needs investigation; Timmer/Gyurjyan investigating
 
 
** EJFAT VLAN Open for Business:
 
** EJFAT VLAN Open for Business:
 
*** Hosts running Ubuntu 20.04
 
*** Hosts running Ubuntu 20.04
Line 55: Line 49:
 
*** LBs: 172.19.22.241-247, indra-s2
 
*** LBs: 172.19.22.241-247, indra-s2
 
**** 172.19.22.241 - Currently reserved for Carl
 
**** 172.19.22.241 - Currently reserved for Carl
**** 172.19.22.242 - Currently reserved for Stacey
+
**** 172.19.22.242 - Currently reserved for Stacey - '''BIOS / Kernel mods pending'''
 
**** 172.19.22.247 - Currently reserved for Mike
 
**** 172.19.22.247 - Currently reserved for Mike
 
*** indra-s2 upgraded to Ubuntu 20.04 LB sucessfully installed, now on EJFAT VLAN / DAQ lo-speed networks via Indra-Lab switch
 
*** indra-s2 upgraded to Ubuntu 20.04 LB sucessfully installed, now on EJFAT VLAN / DAQ lo-speed networks via Indra-Lab switch
Line 67: Line 61:
 
*** (4) DAQ Farm machines ''dafarm6[1-4]'' currently on 129.57.29.17[1-4] - each 32 Xeon 2.0Ghz cores - 1 Gbs NIC + (4) 10Gbs Spare NICs
 
*** (4) DAQ Farm machines ''dafarm6[1-4]'' currently on 129.57.29.17[1-4] - each 32 Xeon 2.0Ghz cores - 1 Gbs NIC + (4) 10Gbs Spare NICs
 
*** (4) Unbuilt DAQ Farm machines - each 32 Xeon 2.0Ghz cores - 1 Gbs NIC + (4) 10Gbs Spare NICs
 
*** (4) Unbuilt DAQ Farm machines - each 32 Xeon 2.0Ghz cores - 1 Gbs NIC + (4) 10Gbs Spare NICs
** [https://misportal.jlab.org/reqs/pr/viewPr.do?prNum=408870 PR408870] [https://misportal.jlab.org/reqs/pr/viewPr.do?prNum=408938 PR408938] (2) 100Gbs Arista switches, <s>transceivers, cables</s>, etc - ETA <s>1 July</s> <s>5 October</s> Ships '''11 November'''
+
** [https://misportal.jlab.org/reqs/pr/viewPr.do?prNum=408870 PR408870] [https://misportal.jlab.org/reqs/pr/viewPr.do?prNum=408938 PR408938] (2) 100Gbs Arista switches
 +
*** '''All equipment recd'''
 +
*** '''Arista switch #1 for EJFAT VLAN to be installed (swapped) soonest'''
 +
*** '''Arista switch #2 - what do we want to use this for?'''  
 
** FPGA LB Throughput - max sustained 90Gbs with s/w data generation
 
** FPGA LB Throughput - max sustained 90Gbs with s/w data generation
* Next Steps:
+
** RT 2022 Paper - submitted August 22 - up to 8 mos. review process
 +
* '''Current Activities'''
 +
** Test plan for all pieces required for integration test
 +
*** Plan A: CLAS12 files -> Carl C++ Packetizer -> LB -> Carls's C++ Reassembler -> ERSAP
 +
*** Plan B: CLAS12 files -> ERSAP C++ Packetizer  -> LB -> Mike's C++ Reassembler -> (tcp) -> ERSAP
 +
*** Plan C: Simulated Packets -> LB -> Mike's C++ Reassembler -> Simulated Host Loading/Feedback
 +
** Control Plane
 +
*** Compute Farm Feed Back Monitor
 +
*** Have working RL (Q-Learning) Schedule Density Adjuster (to be integrated)
 +
*** Need to define Optimality Criterion for Schedule Density
 +
*** DP Supervisor
 +
*** Demonstrate CP based flexibility/elasticity
 +
** Paper for ACAT 2022 Conf Proc. (TBD)
 +
** Mtg with Data Science Dept for CP AI component - '''Friday 11/19/2022 11:00'''
 +
* ESnet Update:
 +
** '''New toolkit for detailed FPGA packet tracking'''
 +
*** Uses DPDK
 +
*** Can send packets via PCIe to FPGA
 +
** '''Working on direct delivery of data from FPGA to host memory via DMA'''
 +
** '''Proposing special packet for CP event_id sync with sender - discuss with DAQ group'''
 +
** '''Wants to know EJFAT II POAM - Graham to sched internal mtg'''
 +
** IPV6 neighbor discovery - in process
 +
** LB F/W Installation Manual - with PCI buffer allocation assurance steps
 +
** In ESnet Legal Review
 +
*** Support C libraries for LB Host Control Plane - '''needs completion of C API doc'''
 +
*** ESnet smartnic open-source GitHub repo - in legal review
 +
*** ESnet private, forkable Jlab P4 and simulations GitHub repo - in legal review
 +
*** FPGA LB data generation capability
 +
* Back Burner / Downstream:
 
** FireHose Benchmark
 
** FireHose Benchmark
 
*** [https://stream-benchmarking.github.io/firehose/ FireHose Benchmark]
 
*** [https://stream-benchmarking.github.io/firehose/ FireHose Benchmark]
Line 90: Line 115:
 
** [https://spdk.io/ SPDK] for hyper storage performance
 
** [https://spdk.io/ SPDK] for hyper storage performance
 
** [https://fasterdata.es.net/host-tuning/linux/udp-tuning/ ESnet UDP tuning]
 
** [https://fasterdata.es.net/host-tuning/linux/udp-tuning/ ESnet UDP tuning]
** '''EJFAT VLAN Infrastructure'''
+
** SLURM env for EJFAT VLAN (Hess)
*** '''Test plan for all pieces required for integration test'''
 
**** '''Plan A: CLAS12 files -> Carl C++ Packetizer -> LB -> Carls's C++ Reassembler -> ERSAP'''
 
**** '''Plan B: CLAS12 files -> ERSAP C++ Packetizer  -> LB -> Mike's C++ Reassembler -> (tcp) -> ERSAP'''
 
**** '''Plan C: Simulated Packets -> LB -> Mike's C++ Reassembler -> Simulated Host Loading/Feedback'''
 
*** '''Control Plane'''
 
**** '''Compute Farm Feed Back Monitor'''
 
**** '''AI/ML Schedule Density Adjuster - have working RL (Q-Learning) component'''
 
**** '''DP Supervisor'''
 
*** '''Demonstrate CP based flexibility/elasticity'''
 
*** '''Paper for ACAT 2022 Conf Proc. (TBD)'''
 
*** SLURM env for EJFAT VLAN (Hess)
 
 
** Vivado Licesnses for new machines - AI/ML ? D. Lawrence POC
 
** Vivado Licesnses for new machines - AI/ML ? D. Lawrence POC
 
** [https://github.com/Xilinx/open-nic-shell Open Nic Shell]
 
** [https://github.com/Xilinx/open-nic-shell Open Nic Shell]
** '''RT 2022 Paper - submitted August 22 - up to 8 mos. review process'''
 
* Back Burner / Downstream:
 
 
** Hall-B FT calorimeter and hodoscope streaming readout test
 
** Hall-B FT calorimeter and hodoscope streaming readout test
 
*** May be able to use Abbott's indra-s1 setup
 
*** May be able to use Abbott's indra-s1 setup

Latest revision as of 17:46, 16 November 2022

The meeting time is 2:00pm.

Connection Info:

You can connect using ZoomGov Video conferencing (ID: 161 203 8101). (Click "Expand" to the right for details -->):

Meeting URL
	https://jlab-org.zoomgov.com/j/1617413961?pwd=QWpXalc0SXFrSUNBNmFrbVZycisrUT09&from=addon

Meeting ID
161 741 3961

Passcode
124964

Want to dial in from a phone?

Dial one of the following numbers:
US: +1 669 254 5252 or +1 646 828 7666 or +1 551 285 1373 or +1 669 216 1590 or 833 568 8864 (Toll Free)

Enter the meeting ID and passcode followed by #

Connecting from a room system?
Dial: bjn.vc or 199.48.152.152 and enter your meeting ID & passcode


Agenda:

  • Previous meeting
  • Announcements:
    • Abstract Submission for CHEP 2023 (Nov 17)
  • Status:
    • Using ESnet FPGA f/w build 28 April
      • Specs
      • Jumbo Frames
      • arp, ping, ICMP filtering
      • Port entropy
    • Script based LB Control Plane
    • EJFAT VLAN Open for Business:
      • Hosts running Ubuntu 20.04
      • 1 Gbs i/f are: ejfat-1, ejfat-2, ejfat-3, ejfat-3, ejfat-5, ejfat-6, ejfat-fs
      • 100 Gbs i/f are: ejfat-1-daq, ejfat-2-daq, ejfat-3-daq, ejfat-3-daq, ejfat-5-daq, ejfat-6-daq, ejfat-fs-daq
      • LBs: 172.19.22.241-247, indra-s2
        • 172.19.22.241 - Currently reserved for Carl
        • 172.19.22.242 - Currently reserved for Stacey - BIOS / Kernel mods pending
        • 172.19.22.247 - Currently reserved for Mike
      • indra-s2 upgraded to Ubuntu 20.04 LB sucessfully installed, now on EJFAT VLAN / DAQ lo-speed networks via Indra-Lab switch
      • /daq-fs/gyurjyan self-contained ERSAP event processing package
    • Spare EJFAT equip loaners:
      • (4) DAQ dev machines indra-s[1-3] 129.57.29/109.23[0-2]
        • alkaid: 24 Xeon Gold 3.4 GHz cores, 100Gbs
        • indra-s1: 24 Xeon Gold 3.0 GHz cores, 100Gbs
        • indra-s2: 32 Xeon Gold 3.2 GHz cores, 100Gbs
        • indra-s3: 32 Xeon Gold 2.3 GHz cores, 100Gbs, 750GB ram disk
      • (4) DAQ Farm machines dafarm6[1-4] currently on 129.57.29.17[1-4] - each 32 Xeon 2.0Ghz cores - 1 Gbs NIC + (4) 10Gbs Spare NICs
      • (4) Unbuilt DAQ Farm machines - each 32 Xeon 2.0Ghz cores - 1 Gbs NIC + (4) 10Gbs Spare NICs
    • PR408870 PR408938 (2) 100Gbs Arista switches
      • All equipment recd
      • Arista switch #1 for EJFAT VLAN to be installed (swapped) soonest
      • Arista switch #2 - what do we want to use this for?
    • FPGA LB Throughput - max sustained 90Gbs with s/w data generation
    • RT 2022 Paper - submitted August 22 - up to 8 mos. review process
  • Current Activities
    • Test plan for all pieces required for integration test
      • Plan A: CLAS12 files -> Carl C++ Packetizer -> LB -> Carls's C++ Reassembler -> ERSAP
      • Plan B: CLAS12 files -> ERSAP C++ Packetizer -> LB -> Mike's C++ Reassembler -> (tcp) -> ERSAP
      • Plan C: Simulated Packets -> LB -> Mike's C++ Reassembler -> Simulated Host Loading/Feedback
    • Control Plane
      • Compute Farm Feed Back Monitor
      • Have working RL (Q-Learning) Schedule Density Adjuster (to be integrated)
      • Need to define Optimality Criterion for Schedule Density
      • DP Supervisor
      • Demonstrate CP based flexibility/elasticity
    • Paper for ACAT 2022 Conf Proc. (TBD)
    • Mtg with Data Science Dept for CP AI component - Friday 11/19/2022 11:00
  • ESnet Update:
    • New toolkit for detailed FPGA packet tracking
      • Uses DPDK
      • Can send packets via PCIe to FPGA
    • Working on direct delivery of data from FPGA to host memory via DMA
    • Proposing special packet for CP event_id sync with sender - discuss with DAQ group
    • Wants to know EJFAT II POAM - Graham to sched internal mtg
    • IPV6 neighbor discovery - in process
    • LB F/W Installation Manual - with PCI buffer allocation assurance steps
    • In ESnet Legal Review
      • Support C libraries for LB Host Control Plane - needs completion of C API doc
      • ESnet smartnic open-source GitHub repo - in legal review
      • ESnet private, forkable Jlab P4 and simulations GitHub repo - in legal review
      • FPGA LB data generation capability
  • Back Burner / Downstream:
  • AOT

Notes:

  • numa tools on ubuntu:
    • sudo apt install hwloc-nox
    • sudo apt install numactl
  • lstopo
  • numactl --hardware
  • To control the scheduling class, you can use the chrt command.
  • To pin to CPUs, use the taskset command. Or use the underlying syscalls.
  • kernel dameon threads
    • handle NIC driver interrupts
    • set scheduling class to SCHED_FIFO / SCHED_RR of reassembly process
  • want to set cpu socket of reassembly in common NUMA domain as NIC