Difference between revisions of "EJFAT"
Jump to navigation
Jump to search
Line 62: | Line 62: | ||
# [https://misportal.jlab.org/reqs/pr/viewPr.do?prNum=408870 PR408870] [https://misportal.jlab.org/reqs/pr/viewPr.do?prNum=408938 PR408938] Requisition 2 of 2: Statement of Work for Switches & Cables | # [https://misportal.jlab.org/reqs/pr/viewPr.do?prNum=408870 PR408870] [https://misportal.jlab.org/reqs/pr/viewPr.do?prNum=408938 PR408938] Requisition 2 of 2: Statement of Work for Switches & Cables | ||
## 1/14/2022: PRs for the switches, transceivers and fiber have been submitted. I added (4) 2km 100G transceivers to support dual 100G connections between the switches. We can always upgrade to 400G in the future, if needed. | ## 1/14/2022: PRs for the switches, transceivers and fiber have been submitted. I added (4) 2km 100G transceivers to support dual 100G connections between the switches. We can always upgrade to 400G in the future, if needed. | ||
− | # [https://developer.nvidia.com/arm-hpc-devkit NVIDIA ARM HPC Developer Kit] | + | # [https://misportal.jlab.org/reqs/pr/viewPr.do?prNum=409850 PR409850] [https://developer.nvidia.com/arm-hpc-devkit NVIDIA ARM HPC Developer Kit] |
− | ## Hardware | + | ## Hardware Specifications for dev kit |
##: [[Model]] GIGABYTE G242-P32, 2U server | ##: [[Model]] GIGABYTE G242-P32, 2U server | ||
##: [[CPU]] 1x Ampere Altra Q80-30 (Arm processor) | ##: [[CPU]] 1x Ampere Altra Q80-30 (Arm processor) | ||
Line 70: | Line 70: | ||
##: [[GPU]] 2x NVIDIA A100 GPU | ##: [[GPU]] 2x NVIDIA A100 GPU | ||
##: [[Network]] 2x NVIDIA® BlueField®-2 E-Series DPU, 200GbE/HDR single-port QSFP56, PCIe Gen4 x16, secure boot enabled, crypto disabled, 16GB on-board DDR, 1GbE OOB management | ##: [[Network]] 2x NVIDIA® BlueField®-2 E-Series DPU, 200GbE/HDR single-port QSFP56, PCIe Gen4 x16, secure boot enabled, crypto disabled, 16GB on-board DDR, 1GbE OOB management | ||
− | |||
== Resources == | == Resources == | ||
* [https://jeffersonlab.sharepoint.com/:b:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/u280_po_Signed_21-M0862%20-%20Avnet.pdf?csf=1&web=1&e=PmJfdu First FPGA PO] | * [https://jeffersonlab.sharepoint.com/:b:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/u280_po_Signed_21-M0862%20-%20Avnet.pdf?csf=1&web=1&e=PmJfdu First FPGA PO] | ||
* [https://www.jlab.org TBD] | * [https://www.jlab.org TBD] |
Revision as of 16:21, 24 February 2022
Welcome to the EJFAT Wiki
(ESnet / JLaB FPGA Accelerated Transport)
System Overview:
EJFAT is a collaboration between Energy Sciences Network (ESnet) and Thomas Jefferson National Laboratory (JLab) for proof of concept engineering to program a Field Programmable Gate Array (FPGA) for network data routing of commonly tagged UDP packets from any data source to individual and configurable destination endpoints in an end-point compute work load balanced manner, including some additional tagging for stream reassembly at the endpoint. The primary purpose of this FPGA based acceleration is to load balance work to destination compute farm endpoints with low latency and full line rate bandwidth of 100 Gbs with feedback from the destination compute farm.
Presentations/Papers
date | presenter | Event | links |
---|---|---|---|
2021-03-01 | G. Heyes | EJFAT Proposal | Word |
2021-10-21 | M. S. Goodrich | Div Brief | |
2021-11-05 | M. S. Goodrich | Canisius College | |
2021-12-03 | S. Sheldon | ESnet LB Tutorial | MP4 |
2021-12-10 | Y. Kumar | SRO iX |
EJFAT Weekly EPSCI Meetings
EJFAT Weekly Collaboration Meetings
Technical Design Overview
EJFAT Technical Design Overview
Edge to Core Test Equipment:
- Price Estimate Spreadsheet
- Networking Diagram
- PR408549 : Requisition 1 of 2 :
- Statement of Work for Servers
- 1/13/2022: EJFAT team decided to solicit two bid responses, one with MLX NIC and one without. Response from Procurement is "I can ask for the two separate quotes. If you are going to purchase both option (with & without add-in cards), once I receive the quotes back, you will have submit a new PR to cover the option (without add-in cards)."
- 1/18/2022: Question from KOI Computers: "please clarify what the part number for the NVIDIA Dual Port ConnectX-6". Replied with part # MCX623106AN-CDAT.
- 1/24/2022: Requisition currently open for bid responses from vendors. Due date is COB 1/24/2022.
- 1/27/2022: PO awarded to Atipa for 6 servers and 1 file-server with FPGA and MLX SmartNIC. Expected delivery date from vendor is 5/31/2022.
- PR408870 PR408938 Requisition 2 of 2: Statement of Work for Switches & Cables
- 1/14/2022: PRs for the switches, transceivers and fiber have been submitted. I added (4) 2km 100G transceivers to support dual 100G connections between the switches. We can always upgrade to 400G in the future, if needed.
- PR409850 NVIDIA ARM HPC Developer Kit
- Hardware Specifications for dev kit
- Model GIGABYTE G242-P32, 2U server
- CPU 1x Ampere Altra Q80-30 (Arm processor)
- Memory 512G DDR4 memory
- Storage 6TB SAS/ SATA 3.5″
- GPU 2x NVIDIA A100 GPU
- Network 2x NVIDIA® BlueField®-2 E-Series DPU, 200GbE/HDR single-port QSFP56, PCIe Gen4 x16, secure boot enabled, crypto disabled, 16GB on-board DDR, 1GbE OOB management
- Hardware Specifications for dev kit