Difference between revisions of "EJFAT EPSCI Meeting Mar. 23, 2022"

From epsciwiki
Jump to navigation Jump to search
(Created page with "The meeting time is 2:00pm. === Connection Info: === <div class="toccolours mw-collapsible mw-collapsed"> You can connect using [https://jlab-org.zoomgov.com/j/1612038101?pwd...")
 
Line 31: Line 31:
 
<!-------------------------------------------------------------------------------------------------->
 
<!-------------------------------------------------------------------------------------------------->
 
=== Agenda: ===
 
=== Agenda: ===
* [[EJFAT EPSCI Meeting Mar.  2, 2022 | Previous meeting]]
+
* [[EJFAT EPSCI Meeting Mar.  9, 2022 | Previous meeting]]
 
*:
 
*:
 
* Situation:
 
* Situation:
 
** Testing with End-to-end EJFAT ERSAP solution on FPGA LB
 
** Testing with End-to-end EJFAT ERSAP solution on FPGA LB
** Jumbo Frames - indra-s2, fpga
+
** Jumbo Frames - indra-s2,s3, fpga
 
** Using script based LB Control Plane
 
** Using script based LB Control Plane
 +
** Control Plane Gratuitous Arp cache updating (Scapy)
 
** Awaiting Compute Equip.- ETA 1 June
 
** Awaiting Compute Equip.- ETA 1 June
 
** Awaiting Networking Equip. - ETA 1 July
 
** Awaiting Networking Equip. - ETA 1 July
** Building Interim Test Environments:
+
** Benchmarks for RT2022 (April 1)
*** 129.57.109.0/24 subnet (100Gbs)
 
**** indra-s[1,2,3], alkaid
 
**** Benchmarks for RT2022 (April 1)
 
*** 129.57.172.0/22 subnet (1Gbs ?) - old/idle Hall-D machines
 
 
* Pending:
 
* Pending:
** <s>Minor f/w change for 'garbage' packets</s>
 
 
** Support C libraries for LB Host Control Plane
 
** Support C libraries for LB Host Control Plane
 
** ESnet smartnic open-source GitHub repo (April)
 
** ESnet smartnic open-source GitHub repo (April)
Line 53: Line 49:
 
*** <s>[[Test Plans | Test Plan]]</s>
 
*** <s>[[Test Plans | Test Plan]]</s>
 
*** Performance Measures (RT2022 - April 01 submission):
 
*** Performance Measures (RT2022 - April 01 submission):
*** Interim Test Environments:
+
*** <s>Control Plane ARP poisoning</s>
**** 129.57.109.0/24 subnet (100Gbs)
+
*** '''alkaid''' -> jumbo frames
***** connect FPGA port #2 to switch
+
*** Linux IP stack buffer size
***** connect Mellanox 100 Gbs NIC port #2 to switch
 
***** Jumbo Frames - indra-s[1,3], alkaid, 109 subnet switch?
 
**** 129.57.172.0/22 subnet (10Gbs)
 
***** ERSAP / EJFAT RE
 
*** Control Plane ARP poisoning (spoofing? proxy?)
 
**** arp -s ''ip_addr'' ''hw_addr'' netmask ''nm'' pub
 
**** sudo arp -i enp134s0 -s 129.57.109.254 00:aa:bb:cc:dd:ee netmask 255.255.255.0 pub
 
***** SIOCSARP: Invalid argument
 
**** sudo arp -i enp134s0 -s 129.57.109.254 00:aa:bb:cc:dd:ee  pub
 
***** Address        HWtype HWaddress    Flags Mask Iface
 
***** 129.57.109.254        (incomplete)            enp175s0f1
 
 
** Downstream:
 
** Downstream:
 +
*** P4 enhancements for
 +
**** data_id <-> reassembly port mapping
 +
**** ipv4 ping, arp
 
*** C-based control plane
 
*** C-based control plane
 
**** Feedback from Compute hosts design
 
**** Feedback from Compute hosts design
**** Control Plane Arp cache / network good citizen - P4 may do
 
 
*** Control Plane daemon for compute host
 
*** Control Plane daemon for compute host
 
*** IPV6 testing
 
*** IPV6 testing
Line 76: Line 63:
 
*** Hall-D EJFAT + SLURM use case
 
*** Hall-D EJFAT + SLURM use case
 
* Issues:
 
* Issues:
 +
** O/S / Dev Tools on indra-s1,3, '''alkaid''', etc.
 
** Abbott spare 8 nodes - OBE?
 
** Abbott spare 8 nodes - OBE?
 
** Hall-D spare 10Gbs NICs - OBE?
 
** Hall-D spare 10Gbs NICs - OBE?
 
** CentOS 7 install on interim boxes - OBE?
 
** CentOS 7 install on interim boxes - OBE?
** Use (2) spare/borrowed switches - OBE?
 
 
** [https://jeffersonlab-my.sharepoint.com/:b:/r/personal/goodrich_jlab_org/Documents/EJFAT/EJFAT%20Network%20Setup.pdf?csf=1&web=1&e=hkUo8k Diagram] - OBE?
 
** [https://jeffersonlab-my.sharepoint.com/:b:/r/personal/goodrich_jlab_org/Documents/EJFAT/EJFAT%20Network%20Setup.pdf?csf=1&web=1&e=hkUo8k Diagram] - OBE?
** ejfat-sw-2022.jlab.org (129.57.29.83) - what is this guy?
 
** Mellanox 40Gbs NIC in indra-s2
 
 
* AOT
 
* AOT
 
<hr>
 
<hr>

Revision as of 17:07, 23 March 2022

The meeting time is 2:00pm.

Connection Info:

You can connect using ZoomGov Video conferencing (ID: 161 203 8101). (Click "Expand" to the right for details -->):

Meeting URL
https://jlab-org.zoomgov.com/j/1612038101?pwd=Yk96QUcyT1NDVTRRUGNtOFVSSTdaUT09&from=addon

Meeting ID
161 203 8101

Passcode
378382

Want to dial in from a phone?

Dial one of the following numbers:
US: +1 669 254 5252 or +1 646 828 7666 or +1 551 285 1373 or +1 669 216 1590 or 833 568 8864 (Toll Free)

Enter the meeting ID and passcode followed by #

Connecting from a room system?
Dial: bjn.vc or 199.48.152.152 and enter your meeting ID & passcode

Agenda:

  • Previous meeting
  • Situation:
    • Testing with End-to-end EJFAT ERSAP solution on FPGA LB
    • Jumbo Frames - indra-s2,s3, fpga
    • Using script based LB Control Plane
    • Control Plane Gratuitous Arp cache updating (Scapy)
    • Awaiting Compute Equip.- ETA 1 June
    • Awaiting Networking Equip. - ETA 1 July
    • Benchmarks for RT2022 (April 1)
  • Pending:
    • Support C libraries for LB Host Control Plane
    • ESnet smartnic open-source GitHub repo (April)
    • ESnet private, forkable Jlab P4 and simulations GitHub repo (April)
  • To Do:
    • Near Term:
      • Test Plan
      • Performance Measures (RT2022 - April 01 submission):
      • Control Plane ARP poisoning
      • alkaid -> jumbo frames
      • Linux IP stack buffer size
    • Downstream:
      • P4 enhancements for
        • data_id <-> reassembly port mapping
        • ipv4 ping, arp
      • C-based control plane
        • Feedback from Compute hosts design
      • Control Plane daemon for compute host
      • IPV6 testing
      • EJFAT Subnet
      • Hall-D EJFAT + SLURM use case
  • Issues:
    • O/S / Dev Tools on indra-s1,3, alkaid, etc.
    • Abbott spare 8 nodes - OBE?
    • Hall-D spare 10Gbs NICs - OBE?
    • CentOS 7 install on interim boxes - OBE?
    • Diagram - OBE?
  • AOT