Difference between revisions of "Discussion of: Moving Compute towards Data in Heterogeneous multi-FPGA Clusters using Partial Reconfiguration and I/O Virtualisation"

Latest revision as of 17:32, 30 September 2021

David:

Started to get lost around page 3
PR = Partial Reconfiguration = uploading new algorithms to FPGA without re-flashing entire board(?)
I/O virtualization
Design requires compute resources to be distributed throughout storage resources. (cost/benefit?)
"... offers users with an illusion of a single and large FPGA, in which they can develop, deploy, and execute applications at large-scale with ease to achieve energy-efficient HPC"
Does the benefit only come if the data you need to process happens to be spread out over many nodes?
- Motivates distributing data as evenly as possible over storage.
- Energy used to extract data from disk? SSD not a problem. Could be increased energy though if you need to spin up two HDDs as opposed to one.

Diana:

NY Article on Master/Slave terminology

Michael:

The Ideal Environment
- Single Major Function Kernel Build
- Plug and Play Deployment
- Multiple Deployments on Single FPGA
- Multiple Deployments on Heterogeneous FPGAs
- Transparent Performance Scaling
Partial Configuration (PR)
- Hot Insertion of FPGA Region
- Currently done via Processor Config Access Port (PCAP)
- Design Modularization
SOTA Issues
- Each Deployment Requires Separate Kernel
- Tools Do Not Abstract FPGA I/O Heterogeneity
- All Kernels Require Distinct I/O Configuration
- Substantial O/H to swap kernels
- Remote Swaps Have Higher Latency
Solution
- I/O Virtualiztion
- High Speed ICAP Dynamic Remote Config Service
- High Level Synthesis
- UNIMEM: effects PGAS
- UNILOGIC: Virtual FPGA (VF) PR
  - Remote PR
  - VF task location
  - Visible Memory Mapped Accelerators (Kernels)
  - Transparent Accelerator Access to RDMA
FPGA Implementation Stack
- Decoupled Accelerator Builds
- H/W Abstractions
- Accelerator I/F Libs:
  - Standardized Register Map
  - Generic Drivers: Streaming I/F, Master/Slave I/F
  - Runtime/Execution API: Hi Level S/W Integration
  - gRPC: async cluster job launch
Performance/Payoff
- ICAP vs. PCAP: Table I
- Inter PR comm latency (I/O Virt) 2-3 ns (Fig 3)
- Build Flow: 55/336 mins (1/6)
- Execution: Fig 4
- Energy: Fig 4

@@ Line 8: / Line 8: @@
 ** Motivates distributing data as evenly as possible over storage.
 ** Energy used to extract data from disk? SSD not a problem. Could be increased energy though if you need to spin up two HDDs as opposed to one.
+'''Diana:'''
+* [https://www.nytimes.com/2021/04/13/technology/racist-computer-engineering-terms-ietf.html NY Article on Master/Slave terminology]
+'''Michael:'''
+* The Ideal Environment
+** Single Major Function Kernel Build
+** Plug and Play Deployment
+** Multiple Deployments on Single FPGA
+** Multiple Deployments on Heterogeneous FPGAs
+** Transparent Performance Scaling
+* Partial Configuration (PR)
+** Hot Insertion of FPGA Region
+** Currently done via Processor Config Access Port (PCAP)
+** Design Modularization
+* SOTA Issues
+** Each Deployment Requires Separate Kernel
+** Tools Do Not Abstract FPGA I/O Heterogeneity
+** All Kernels Require Distinct I/O Configuration
+** Substantial O/H to swap kernels
+** Remote Swaps Have Higher Latency
+* Solution
+** I/O Virtualiztion
+** High Speed ICAP Dynamic Remote Config Service
+** High Level Synthesis
+** UNIMEM: effects PGAS
+** UNILOGIC: Virtual FPGA (VF) PR
+*** Remote PR
+*** VF task location
+*** Visible Memory Mapped Accelerators (Kernels)
+*** Transparent Accelerator Access to RDMA
+* FPGA Implementation Stack
+**  Decoupled Accelerator Builds
+** H/W Abstractions
+** Accelerator I/F Libs:
+*** Standardized Register Map
+*** Generic Drivers: Streaming I/F, Master/Slave I/F
+*** Runtime/Execution API: Hi Level S/W Integration
+*** gRPC: async cluster job launch
+* Performance/Payoff
+** ICAP vs. PCAP: Table I
+** Inter PR comm latency (I/O Virt) 2-3 ns (Fig 3)
+** Build Flow: 55/336 mins (1/6)
+** Execution: Fig 4
+** Energy: Fig 4

Difference between revisions of "Discussion of: Moving Compute towards Data in Heterogeneous multi-FPGA Clusters using Partial Reconfiguration and I/O Virtualisation"

Latest revision as of 17:32, 30 September 2021

Navigation menu

Search