Difference between revisions of "EJFAT EPSCI Meeting Nov. 16, 2022"
Jump to navigation
Jump to search
(Created page with "The meeting time is 2:00pm. === Connection Info: === <div class="toccolours mw-collapsible mw-collapsed"> You can connect using [https://jlab-org.zoomgov.com/j/1612038101?pwd...") |
|||
Line 32: | Line 32: | ||
=== Agenda: === | === Agenda: === | ||
− | * [[EJFAT EPSCI Meeting | + | * [[EJFAT EPSCI Meeting Nov. 09, 2022 | Previous meeting]] |
*: | *: | ||
+ | * Announcements: | ||
+ | ** '''Abstract Submission for CHEP 2023 (Nov 17)''' | ||
* Status: | * Status: | ||
** Using ESnet FPGA f/w build 28 April | ** Using ESnet FPGA f/w build 28 April | ||
Line 40: | Line 42: | ||
*** arp, ping, ICMP filtering | *** arp, ping, ICMP filtering | ||
*** Port entropy | *** Port entropy | ||
− | |||
− | |||
** Script based LB Control Plane | ** Script based LB Control Plane | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
** EJFAT VLAN Open for Business: | ** EJFAT VLAN Open for Business: | ||
*** Hosts running Ubuntu 20.04 | *** Hosts running Ubuntu 20.04 | ||
Line 55: | Line 49: | ||
*** LBs: 172.19.22.241-247, indra-s2 | *** LBs: 172.19.22.241-247, indra-s2 | ||
**** 172.19.22.241 - Currently reserved for Carl | **** 172.19.22.241 - Currently reserved for Carl | ||
− | **** 172.19.22.242 - Currently reserved for Stacey | + | **** 172.19.22.242 - Currently reserved for Stacey - '''BIOS / Kernel mods pending''' |
**** 172.19.22.247 - Currently reserved for Mike | **** 172.19.22.247 - Currently reserved for Mike | ||
*** indra-s2 upgraded to Ubuntu 20.04 LB sucessfully installed, now on EJFAT VLAN / DAQ lo-speed networks via Indra-Lab switch | *** indra-s2 upgraded to Ubuntu 20.04 LB sucessfully installed, now on EJFAT VLAN / DAQ lo-speed networks via Indra-Lab switch | ||
Line 67: | Line 61: | ||
*** (4) DAQ Farm machines ''dafarm6[1-4]'' currently on 129.57.29.17[1-4] - each 32 Xeon 2.0Ghz cores - 1 Gbs NIC + (4) 10Gbs Spare NICs | *** (4) DAQ Farm machines ''dafarm6[1-4]'' currently on 129.57.29.17[1-4] - each 32 Xeon 2.0Ghz cores - 1 Gbs NIC + (4) 10Gbs Spare NICs | ||
*** (4) Unbuilt DAQ Farm machines - each 32 Xeon 2.0Ghz cores - 1 Gbs NIC + (4) 10Gbs Spare NICs | *** (4) Unbuilt DAQ Farm machines - each 32 Xeon 2.0Ghz cores - 1 Gbs NIC + (4) 10Gbs Spare NICs | ||
− | ** [https://misportal.jlab.org/reqs/pr/viewPr.do?prNum=408870 PR408870] [https://misportal.jlab.org/reqs/pr/viewPr.do?prNum=408938 PR408938] (2) 100Gbs Arista switches | + | ** [https://misportal.jlab.org/reqs/pr/viewPr.do?prNum=408870 PR408870] [https://misportal.jlab.org/reqs/pr/viewPr.do?prNum=408938 PR408938] (2) 100Gbs Arista switches |
+ | *** '''All equipment recd''' | ||
+ | *** '''Arista switch #1 for EJFAT VLAN to be installed (swapped) soonest''' | ||
+ | *** '''Arista switch #2 - what do we want to use this for?''' | ||
** FPGA LB Throughput - max sustained 90Gbs with s/w data generation | ** FPGA LB Throughput - max sustained 90Gbs with s/w data generation | ||
− | * | + | ** RT 2022 Paper - submitted August 22 - up to 8 mos. review process |
+ | * '''Current Activities''' | ||
+ | ** Test plan for all pieces required for integration test | ||
+ | *** Plan A: CLAS12 files -> Carl C++ Packetizer -> LB -> Carls's C++ Reassembler -> ERSAP | ||
+ | *** Plan B: CLAS12 files -> ERSAP C++ Packetizer -> LB -> Mike's C++ Reassembler -> (tcp) -> ERSAP | ||
+ | *** Plan C: Simulated Packets -> LB -> Mike's C++ Reassembler -> Simulated Host Loading/Feedback | ||
+ | ** Control Plane | ||
+ | *** Compute Farm Feed Back Monitor | ||
+ | *** Have working RL (Q-Learning) Schedule Density Adjuster (to be integrated) | ||
+ | *** Need to define Optimality Criterion for Schedule Density | ||
+ | *** DP Supervisor | ||
+ | *** Demonstrate CP based flexibility/elasticity | ||
+ | ** Paper for ACAT 2022 Conf Proc. (TBD) | ||
+ | ** Mtg with Data Science Dept for CP AI component - '''Friday 11/19/2022 11:00''' | ||
+ | * ESnet Update: | ||
+ | ** '''New toolkit for detailed FPGA packet tracking''' | ||
+ | *** Uses DPDK | ||
+ | *** Can send packets via PCIe to FPGA | ||
+ | ** '''Working on direct delivery of data from FPGA to host memory via DMA''' | ||
+ | ** '''Proposing special packet for CP event_id sync with sender - discuss with DAQ group''' | ||
+ | ** '''Wants to know EJFAT II POAM - Graham to sched internal mtg''' | ||
+ | ** IPV6 neighbor discovery - in process | ||
+ | ** LB F/W Installation Manual - with PCI buffer allocation assurance steps | ||
+ | ** In ESnet Legal Review | ||
+ | *** Support C libraries for LB Host Control Plane - '''needs completion of C API doc''' | ||
+ | *** ESnet smartnic open-source GitHub repo - in legal review | ||
+ | *** ESnet private, forkable Jlab P4 and simulations GitHub repo - in legal review | ||
+ | *** FPGA LB data generation capability | ||
+ | * Back Burner / Downstream: | ||
** FireHose Benchmark | ** FireHose Benchmark | ||
*** [https://stream-benchmarking.github.io/firehose/ FireHose Benchmark] | *** [https://stream-benchmarking.github.io/firehose/ FireHose Benchmark] | ||
Line 90: | Line 115: | ||
** [https://spdk.io/ SPDK] for hyper storage performance | ** [https://spdk.io/ SPDK] for hyper storage performance | ||
** [https://fasterdata.es.net/host-tuning/linux/udp-tuning/ ESnet UDP tuning] | ** [https://fasterdata.es.net/host-tuning/linux/udp-tuning/ ESnet UDP tuning] | ||
− | + | ** SLURM env for EJFAT VLAN (Hess) | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
** Vivado Licesnses for new machines - AI/ML ? D. Lawrence POC | ** Vivado Licesnses for new machines - AI/ML ? D. Lawrence POC | ||
** [https://github.com/Xilinx/open-nic-shell Open Nic Shell] | ** [https://github.com/Xilinx/open-nic-shell Open Nic Shell] | ||
− | |||
− | |||
** Hall-B FT calorimeter and hodoscope streaming readout test | ** Hall-B FT calorimeter and hodoscope streaming readout test | ||
*** May be able to use Abbott's indra-s1 setup | *** May be able to use Abbott's indra-s1 setup |
Latest revision as of 17:46, 16 November 2022
The meeting time is 2:00pm.
Connection Info:
You can connect using ZoomGov Video conferencing (ID: 161 203 8101). (Click "Expand" to the right for details -->):
Meeting URL https://jlab-org.zoomgov.com/j/1617413961?pwd=QWpXalc0SXFrSUNBNmFrbVZycisrUT09&from=addon Meeting ID 161 741 3961 Passcode 124964 Want to dial in from a phone? Dial one of the following numbers: US: +1 669 254 5252 or +1 646 828 7666 or +1 551 285 1373 or +1 669 216 1590 or 833 568 8864 (Toll Free) Enter the meeting ID and passcode followed by # Connecting from a room system? Dial: bjn.vc or 199.48.152.152 and enter your meeting ID & passcode
Agenda:
- Previous meeting
- Announcements:
- Abstract Submission for CHEP 2023 (Nov 17)
- Status:
- Using ESnet FPGA f/w build 28 April
- Specs
- Jumbo Frames
- arp, ping, ICMP filtering
- Port entropy
- Script based LB Control Plane
- EJFAT VLAN Open for Business:
- Hosts running Ubuntu 20.04
- 1 Gbs i/f are: ejfat-1, ejfat-2, ejfat-3, ejfat-3, ejfat-5, ejfat-6, ejfat-fs
- 100 Gbs i/f are: ejfat-1-daq, ejfat-2-daq, ejfat-3-daq, ejfat-3-daq, ejfat-5-daq, ejfat-6-daq, ejfat-fs-daq
- LBs: 172.19.22.241-247, indra-s2
- 172.19.22.241 - Currently reserved for Carl
- 172.19.22.242 - Currently reserved for Stacey - BIOS / Kernel mods pending
- 172.19.22.247 - Currently reserved for Mike
- indra-s2 upgraded to Ubuntu 20.04 LB sucessfully installed, now on EJFAT VLAN / DAQ lo-speed networks via Indra-Lab switch
- /daq-fs/gyurjyan self-contained ERSAP event processing package
- Spare EJFAT equip loaners:
- (4) DAQ dev machines indra-s[1-3] 129.57.29/109.23[0-2]
- alkaid: 24 Xeon Gold 3.4 GHz cores, 100Gbs
- indra-s1: 24 Xeon Gold 3.0 GHz cores, 100Gbs
- indra-s2: 32 Xeon Gold 3.2 GHz cores, 100Gbs
- indra-s3: 32 Xeon Gold 2.3 GHz cores, 100Gbs, 750GB ram disk
- (4) DAQ Farm machines dafarm6[1-4] currently on 129.57.29.17[1-4] - each 32 Xeon 2.0Ghz cores - 1 Gbs NIC + (4) 10Gbs Spare NICs
- (4) Unbuilt DAQ Farm machines - each 32 Xeon 2.0Ghz cores - 1 Gbs NIC + (4) 10Gbs Spare NICs
- (4) DAQ dev machines indra-s[1-3] 129.57.29/109.23[0-2]
- PR408870 PR408938 (2) 100Gbs Arista switches
- All equipment recd
- Arista switch #1 for EJFAT VLAN to be installed (swapped) soonest
- Arista switch #2 - what do we want to use this for?
- FPGA LB Throughput - max sustained 90Gbs with s/w data generation
- RT 2022 Paper - submitted August 22 - up to 8 mos. review process
- Using ESnet FPGA f/w build 28 April
- Current Activities
- Test plan for all pieces required for integration test
- Plan A: CLAS12 files -> Carl C++ Packetizer -> LB -> Carls's C++ Reassembler -> ERSAP
- Plan B: CLAS12 files -> ERSAP C++ Packetizer -> LB -> Mike's C++ Reassembler -> (tcp) -> ERSAP
- Plan C: Simulated Packets -> LB -> Mike's C++ Reassembler -> Simulated Host Loading/Feedback
- Control Plane
- Compute Farm Feed Back Monitor
- Have working RL (Q-Learning) Schedule Density Adjuster (to be integrated)
- Need to define Optimality Criterion for Schedule Density
- DP Supervisor
- Demonstrate CP based flexibility/elasticity
- Paper for ACAT 2022 Conf Proc. (TBD)
- Mtg with Data Science Dept for CP AI component - Friday 11/19/2022 11:00
- Test plan for all pieces required for integration test
- ESnet Update:
- New toolkit for detailed FPGA packet tracking
- Uses DPDK
- Can send packets via PCIe to FPGA
- Working on direct delivery of data from FPGA to host memory via DMA
- Proposing special packet for CP event_id sync with sender - discuss with DAQ group
- Wants to know EJFAT II POAM - Graham to sched internal mtg
- IPV6 neighbor discovery - in process
- LB F/W Installation Manual - with PCI buffer allocation assurance steps
- In ESnet Legal Review
- Support C libraries for LB Host Control Plane - needs completion of C API doc
- ESnet smartnic open-source GitHub repo - in legal review
- ESnet private, forkable Jlab P4 and simulations GitHub repo - in legal review
- FPGA LB data generation capability
- New toolkit for detailed FPGA packet tracking
- Back Burner / Downstream:
- FireHose Benchmark
- Network Performance:
- cpusets
- Pktgen-DPDK
- Receive Side Scaling
- Packet Loss Studies / LB Round Robin Studies - migrated to Compute Center / EJFAT / EJFAT VLAN
- DAQ/VTP Data Generation Test Harness - 40 Gbs initially - after FEG ???
- Host NICs
- Host S/W Reassembly - better algorithms, buffering, asynchronicity, etc.
- EJFAT UDP Transmission Performance
- Need better parameters for event reassembly/reconstruction
- DPDK will own NIC bypassing kernel, etc. - ESnet reports can stream 100 Gbps using DPDK; Pktgen-DPDK
- Look at ROCE / NIC
- SPDK for hyper storage performance
- ESnet UDP tuning
- SLURM env for EJFAT VLAN (Hess)
- Vivado Licesnses for new machines - AI/ML ? D. Lawrence POC
- Open Nic Shell
- Hall-B FT calorimeter and hodoscope streaming readout test
- May be able to use Abbott's indra-s1 setup
- May be able to use new VTP f/w with Hall-B VTP's
- CODA 3.10 + ERSAP for new VTP f/w
- CODA 2.0 (non-streaming) for old VTP f/w
- Diagram
- Hall-B to start taking data June 8
- Hall B VTPs on .167. subnet
- HOSS
- parallelize writing of raw data files
- distribute raw data across multiple compute nodes for calibration skims
- 1 Gbs at hi-luminosity
- Hall-D comms with DAQ 109 subnet require network customization; (EJFAT subnet)
- Hall-D EJFAT use case
- Hall-D EJFAT Network Diagram
- IPV6 testing
- AOT
Notes:
- numa tools on ubuntu:
- sudo apt install hwloc-nox
- sudo apt install numactl
- lstopo
- numactl --hardware
- To control the scheduling class, you can use the chrt command.
- To pin to CPUs, use the taskset command. Or use the underlying syscalls.
- kernel dameon threads
- handle NIC driver interrupts
- set scheduling class to SCHED_FIFO / SCHED_RR of reassembly process
- want to set cpu socket of reassembly in common NUMA domain as NIC