Difference between revisions of "EJFAT EPSCI Meeting Mar. 9, 2022"

From epsciwiki
Jump to navigation Jump to search
Line 34: Line 34:
 
*:
 
*:
 
* Situation:
 
* Situation:
** Testing with ERSAP on FPGA LB
+
** Testing with End-to-end EJFAT ERSAP solution on FPGA LB
 +
** Jumbo Frames - indra-s2, fpga
 
** Using script based LB Control Plane
 
** Using script based LB Control Plane
 
** Awaiting Compute Equip.- ETA 1 June
 
** Awaiting Compute Equip.- ETA 1 June
 
** Awaiting Networking Equip. - ETA 1 July
 
** Awaiting Networking Equip. - ETA 1 July
** End-to-end EJFAT ERSAP solution
+
** Building Interim Test Environments:
** Building Interim Test Lab
+
*** 129.57.109.0/24 subnet (100Gbs)
*** Use FPGA port #1 for local host subnet testing
+
**** indra-s[1,2,3], alkaid
*** Use FPGA port #2 for switch fabric testing
+
**** Benchmarks for RT2022 (April 1)
*** Use (2) spare/borrowed switches
+
*** 129.57.172.0/22 subnet (10Gbs)
*** Using spare 8 nodes (Abbott)
 
*** Using Hall-D spare 10Gbs NICs
 
*** [https://jeffersonlab-my.sharepoint.com/:b:/r/personal/goodrich_jlab_org/Documents/EJFAT/EJFAT%20Network%20Setup.pdf?csf=1&web=1&e=hkUo8k Diagram]
 
 
* Pending:
 
* Pending:
 
** <s>Minor f/w change for 'garbage' packets</s>
 
** <s>Minor f/w change for 'garbage' packets</s>
Line 52: Line 50:
 
** ESnet private, forkable Jlab P4 and simulations GitHub repo (April)
 
** ESnet private, forkable Jlab P4 and simulations GitHub repo (April)
 
* To Do:
 
* To Do:
** Connect
+
** Near Term:
*** FPGA port #2 to switch
+
*** <s>[[Test Plans | Test Plan]]</s>
*** Mellanox NIC port #2 to switch
+
*** Performance Measures (RT2022 - April 01 submission):
*** Switch config for FPGA
+
*** Interim Test Environments:
** CentOS 7 install on interim boxes - next week
+
**** 129.57.109.0/24 subnet (100Gbs)
** C-based control plane
+
***** connect FPGA port #2 to switch
*** Feedback from Compute hosts design
+
***** connect Mellanox 100 Gbs NIC port #2 to switch
*** Control Plane Arp cache / network good citizen - P4 may do
+
***** Jumbo Frames - indra-s[1,3], alkaid, 109 subnet switch?
** Control Plane daemon for compute host
+
**** 129.57.172.0/22 subnet (10Gbs)
** Jumbo Frames
+
***** ERSAP / EJFAT RE
** IPV6 testing
+
*** Control Plane ARP poisoning
** EJFAT Subnet
+
** Downstream:
** Hall-D EJFAT + SLURM use case
+
*** C-based control plane
** Performance Measures (RT2022 - April 01 submission):
+
**** Feedback from Compute hosts design
*** Stress Test ERSAP / EJFAT
+
**** Control Plane Arp cache / network good citizen - P4 may do
*** Payload Size
+
*** Control Plane daemon for compute host
*** Reassembly
+
*** IPV6 testing
*** Multiple Back-Ends
+
*** EJFAT Subnet
*** AOT
+
*** Hall-D EJFAT + SLURM use case
* [[Test Plans | Test Plan]]
+
* Issues:
*:
+
** Abbott spare 8 nodes - OBE?
 +
** Hall-D spare 10Gbs NICs - OBE?
 +
** CentOS 7 install on interim boxes - OBE?
 +
** Use (2) spare/borrowed switches - OBE?
 +
** [https://jeffersonlab-my.sharepoint.com/:b:/r/personal/goodrich_jlab_org/Documents/EJFAT/EJFAT%20Network%20Setup.pdf?csf=1&web=1&e=hkUo8k Diagram] - OBE?
 +
** ejfat-sw-2022.jlab.org (129.57.29.83) - what is this guy?
 +
** Mellanox 40Gbs NIC in indra-s2
 
* AOT
 
* AOT
 
<hr>
 
<hr>

Revision as of 15:14, 9 March 2022

The meeting time is 2:00pm.

Connection Info:

You can connect using ZoomGov Video conferencing (ID: 161 203 8101). (Click "Expand" to the right for details -->):

Meeting URL
https://jlab-org.zoomgov.com/j/1612038101?pwd=Yk96QUcyT1NDVTRRUGNtOFVSSTdaUT09&from=addon

Meeting ID
161 203 8101

Passcode
378382

Want to dial in from a phone?

Dial one of the following numbers:
US: +1 669 254 5252 or +1 646 828 7666 or +1 551 285 1373 or +1 669 216 1590 or 833 568 8864 (Toll Free)

Enter the meeting ID and passcode followed by #

Connecting from a room system?
Dial: bjn.vc or 199.48.152.152 and enter your meeting ID & passcode

Agenda:

  • Previous meeting
  • Situation:
    • Testing with End-to-end EJFAT ERSAP solution on FPGA LB
    • Jumbo Frames - indra-s2, fpga
    • Using script based LB Control Plane
    • Awaiting Compute Equip.- ETA 1 June
    • Awaiting Networking Equip. - ETA 1 July
    • Building Interim Test Environments:
      • 129.57.109.0/24 subnet (100Gbs)
        • indra-s[1,2,3], alkaid
        • Benchmarks for RT2022 (April 1)
      • 129.57.172.0/22 subnet (10Gbs)
  • Pending:
    • Minor f/w change for 'garbage' packets
    • Support C libraries for LB Host Control Plane
    • ESnet smartnic open-source GitHub repo (April)
    • ESnet private, forkable Jlab P4 and simulations GitHub repo (April)
  • To Do:
    • Near Term:
      • Test Plan
      • Performance Measures (RT2022 - April 01 submission):
      • Interim Test Environments:
        • 129.57.109.0/24 subnet (100Gbs)
          • connect FPGA port #2 to switch
          • connect Mellanox 100 Gbs NIC port #2 to switch
          • Jumbo Frames - indra-s[1,3], alkaid, 109 subnet switch?
        • 129.57.172.0/22 subnet (10Gbs)
          • ERSAP / EJFAT RE
      • Control Plane ARP poisoning
    • Downstream:
      • C-based control plane
        • Feedback from Compute hosts design
        • Control Plane Arp cache / network good citizen - P4 may do
      • Control Plane daemon for compute host
      • IPV6 testing
      • EJFAT Subnet
      • Hall-D EJFAT + SLURM use case
  • Issues:
    • Abbott spare 8 nodes - OBE?
    • Hall-D spare 10Gbs NICs - OBE?
    • CentOS 7 install on interim boxes - OBE?
    • Use (2) spare/borrowed switches - OBE?
    • Diagram - OBE?
    • ejfat-sw-2022.jlab.org (129.57.29.83) - what is this guy?
    • Mellanox 40Gbs NIC in indra-s2
  • AOT