Difference between revisions of "EJFAT EPSCI Meeting Jun. 22, 2022"
Jump to navigation
Jump to search
Line 70: | Line 70: | ||
*** [https://www.open-mpi.org/projects/hwloc/lstopo/ lstopo] | *** [https://www.open-mpi.org/projects/hwloc/lstopo/ lstopo] | ||
*** numactl --hardware | *** numactl --hardware | ||
− | |||
*** To control the scheduling class, you can use the [https://www.informit.com/articles/article.aspx?p=101760&seqNum=4#:~:text=Linux%20provides%20two%20real%2Dtime,scheduled%20over%20any%20SCHED_OTHER%20tasks. chrt] command. | *** To control the scheduling class, you can use the [https://www.informit.com/articles/article.aspx?p=101760&seqNum=4#:~:text=Linux%20provides%20two%20real%2Dtime,scheduled%20over%20any%20SCHED_OTHER%20tasks. chrt] command. | ||
*** To pin to CPUs, use the ''taskset'' command. Or use the underlying ''syscalls''. | *** To pin to CPUs, use the ''taskset'' command. Or use the underlying ''syscalls''. | ||
Line 77: | Line 76: | ||
**** set scheduling class to ''SCHED_FIFO'' / ''SCHED_RR'' of reassembly process | **** set scheduling class to ''SCHED_FIFO'' / ''SCHED_RR'' of reassembly process | ||
*** want to set cpu socket of reassembly in common NUMA domain as NIC | *** want to set cpu socket of reassembly in common NUMA domain as NIC | ||
− | *** DPDK will own NIC bypassing kernel, etc.; [https://github.com/pktgen/Pktgen-DPDK Pktgen-DPDK] | + | *** [http://www.dpdk.org DPDK] will own NIC bypassing kernel, etc. - ESnet reports can stream 100 Gbps using DPDK; [https://github.com/pktgen/Pktgen-DPDK Pktgen-DPDK] |
*** [https://pktgen-dpdk.readthedocs.io/en/latest Pktgen-DPDK] | *** [https://pktgen-dpdk.readthedocs.io/en/latest Pktgen-DPDK] | ||
+ | *** Look at [https://support.mellanox.com/s/article/roce-v2-considerations ROCE] / NIC | ||
** [https://www.overleaf.com/latex/templates/latex-template-for-technical-report/qtznkrpkjybm Candidate Test Report] | ** [https://www.overleaf.com/latex/templates/latex-template-for-technical-report/qtznkrpkjybm Candidate Test Report] | ||
** [https://fasterdata.es.net/host-tuning/linux/udp-tuning/ ESnet UDP tuning] | ** [https://fasterdata.es.net/host-tuning/linux/udp-tuning/ ESnet UDP tuning] | ||
Line 86: | Line 86: | ||
*** Control Plane daemon for compute host (?) | *** Control Plane daemon for compute host (?) | ||
*** Demonstrate CP based flexibility/elasticity | *** Demonstrate CP based flexibility/elasticity | ||
− | * | + | *** SLURM env for EJFAT VLAN (Hess) |
− | |||
− | |||
− | ** SLURM env for EJFAT VLAN (Hess) | ||
** DAQ/VTP Data Generation Test Harness | ** DAQ/VTP Data Generation Test Harness | ||
** <s>Vivado Licesnses for new machines</s> | ** <s>Vivado Licesnses for new machines</s> | ||
+ | ** [https://github.com/Xilinx/open-nic-shell Open Nic Shell] | ||
** ACAT 2022 - September/Italy - Abstract (July 14) / Paper | ** ACAT 2022 - September/Italy - Abstract (July 14) / Paper | ||
** [https://indico.cern.ch/event/1109460/ RT2022 - August 01-05 Conference] | ** [https://indico.cern.ch/event/1109460/ RT2022 - August 01-05 Conference] |
Latest revision as of 17:15, 22 June 2022
The meeting time is 2:00pm.
Connection Info:
You can connect using ZoomGov Video conferencing (ID: 161 203 8101). (Click "Expand" to the right for details -->):
Meeting URL https://jlab-org.zoomgov.com/j/1617413961?pwd=QWpXalc0SXFrSUNBNmFrbVZycisrUT09&from=addon Meeting ID 161 741 3961 Passcode 124964 Want to dial in from a phone? Dial one of the following numbers: US: +1 669 254 5252 or +1 646 828 7666 or +1 551 285 1373 or +1 669 216 1590 or 833 568 8864 (Toll Free) Enter the meeting ID and passcode followed by # Connecting from a room system? Dial: bjn.vc or 199.48.152.152 and enter your meeting ID & passcode
Agenda:
- Previous meeting
- Status:
- Using ESnet FPGA f/w build 28 April
- Specs
- Jumbo Frames
- arp, ping, ICMP filtering
- Port entropy
- Script based LB Control Plane
- In ESnet Legal Review
- Support C libraries for LB Host Control Plane
- ESnet smartnic open-source GitHub repo - in legal review
- ESnet private, forkable Jlab P4 and simulations GitHub repo - in legal review
- ERSAP feed end bottleneck needs investigation; Timmer's blaster may provide relief
- New machines (6) rec'd, installed w/ Ubuntu 20.04 on EJFAT subnet (VLAN 937 172.19.22.0/24)
- Spare EJFAT equip loaners:
- (4) DAQ dev machines indra-s[1-3] 129.57.29/109.23[0-2]
- alkaid: 24 Xeon Gold 3.4 GHz cores, 100Gbs
- indra-s1: 24 Xeon Gold 3.0 GHz cores, 100Gbs
- indra-s2: 32 Xeon Gold 3.2 GHz cores, 100Gbs
- indra-s3: 32 Xeon Gold 2.3 GHz cores, 100Gbs, 750GB ram disk
- (4) DAQ Farm machines dafarm6[1-4] currently on 129.57.29.17[1-4] - each 32 Xeon 2.0Ghz cores - 1 Gbs NIC + (4) 10Gbs Spare NICs
- (4) Unbuilt DAQ Farm machines - each 32 Xeon 2.0Ghz cores - 1 Gbs NIC + (4) 10Gbs Spare NICs
- PR408549 (6) 100Gbs NICs - Shipped
- PR408870 PR408938 (2) 100Gbs Arista switches,
transceivers, cables, etc - ETA1 July5 October
- (4) DAQ dev machines indra-s[1-3] 129.57.29/109.23[0-2]
- Using ESnet FPGA f/w build 28 April
- Next Steps:
- EJFAT VLAN Checkout
- Network Performance:
- FPGA LB Throughput - max sustained 90Gbs
- Host NICs
- Host S/W Reassembly - better algorithms, buffering, asynchronicity, etc.
- EJFAT UDP Transmission Performance
- Need better parameters for event reassembly/reconstruction
- numa tools on ubuntu:
- sudo apt install hwloc-nox
- sudo apt install numactl
- lstopo
- numactl --hardware
- To control the scheduling class, you can use the chrt command.
- To pin to CPUs, use the taskset command. Or use the underlying syscalls.
- kernel dameon threads
- handle NIC driver interrupts
- set scheduling class to SCHED_FIFO / SCHED_RR of reassembly process
- want to set cpu socket of reassembly in common NUMA domain as NIC
- DPDK will own NIC bypassing kernel, etc. - ESnet reports can stream 100 Gbps using DPDK; Pktgen-DPDK
- Pktgen-DPDK
- Look at ROCE / NIC
- Candidate Test Report
- ESnet UDP tuning
- Control Plane
- Will interact with SLURM / Kubernetes
- Python based (?)
- Control Plane daemon for compute host (?)
- Demonstrate CP based flexibility/elasticity
- SLURM env for EJFAT VLAN (Hess)
- DAQ/VTP Data Generation Test Harness
Vivado Licesnses for new machines- Open Nic Shell
- ACAT 2022 - September/Italy - Abstract (July 14) / Paper
- RT2022 - August 01-05 Conference
- RT 2022 Paper
- Back Burner / Downstream:
- Hall-B FT calorimeter and hodoscope streaming readout test
- May be able to use Abbott's indra-s1 setup
- May be able to use new VTP f/w with Hall-B VTP's
- CODA 3.10 + ERSAP for new VTP f/w
- CODA 2.0 (non-streaming) for old VTP f/w
- Diagram
- Hall-B to start taking data June 8
- Hall B VTPs on .167. subnet
- HOSS
- parallelize writing of raw data files
- distribute raw data across multiple compute nodes for calibration skims
- 1 Gbs at hi-luminosity
- Hall-D comms with DAQ 109 subnet require network customization; (EJFAT subnet)
- Hall-D EJFAT use case
- Hall-D EJFAT Network Diagram
- IPV6 testing
- Hall-B FT calorimeter and hodoscope streaming readout test
- AOT