Difference between revisions of "Discussion of: Moving Compute towards Data in Heterogeneous multi-FPGA Clusters using Partial Reconfiguration and I/O Virtualisation"

From epsciwiki
Jump to navigation Jump to search
Line 11: Line 11:
 
'''Diana:'''
 
'''Diana:'''
 
* [https://www.nytimes.com/2021/04/13/technology/racist-computer-engineering-terms-ietf.html NY Article on Master/Slave terminology]
 
* [https://www.nytimes.com/2021/04/13/technology/racist-computer-engineering-terms-ietf.html NY Article on Master/Slave terminology]
 +
 +
'''Michael:'''
 +
* The Ideal Environment
 +
** Single Major Function Kernel Build
 +
** Plug and Play Deployment
 +
** Multiple Deployments on Single FPGA
 +
** Multiple Deployments on Heterogeneous FPGAs
 +
** Transparent Performance Scaling
 +
* Partial Configuration (PR)
 +
** Hot Insertion of FPGA Region
 +
** Currently done via Processor Config Access Port (PCAP)
 +
** Design Modularization
 +
* SOTA Issues
 +
** Each Deployment Requires Separate Kernel
 +
** Tools Do Not Abstract FPGA I/O Heterogeneity
 +
** All Kernels Require Distinct I/O Configuration
 +
** Substantial O/H to swap kernels
 +
** Remote Swaps Have Higher Latency
 +
* Solution
 +
** I/O Virtualiztion
 +
** High Speed ICAP Dynamic Remote Config Service
 +
** High Level Synthesis
 +
** UNIMEM: effects PGAS
 +
** UNILOGIC: Virtual FPGA (VF) PR
 +
* UNILOGIC
 +
** Remote PR
 +
** VF task location
 +
** Visible Memory Mapped Accelerators (Kernels)
 +
** Transparent Accelerator Access to RDMA
 +
* FPGA Implementation Stack
 +
**  Decoupled Accelerator Builds
 +
** H/W Abstractions
 +
** Accelerator I/F Libs:
 +
*** Standardized Register Map
 +
*** Generic Drivers: Streaming I/F, Master/Slave I/F
 +
*** Runtime/Execution API: Hi Level S/W Integration
 +
*** gRPC: async cluster job launch
 +
* Performance/Payoff
 +
** ICAP vs. PCAP: Table I
 +
** Inter PR comm latency (I/O Virt) 2-3 ns (Fig 3)
 +
** Build Flow: 55/336 mins (1/6)
 +
** Execution: Fig 4
 +
** Energy: Fig 4

Revision as of 14:34, 30 September 2021

David:

  • Started to get lost around page 3
  • PR = Partial Reconfiguration = uploading new algorithms to FPGA without re-flashing entire board(?)
  • I/O virtualization
  • Design requires compute resources to be distributed throughout storage resources. (cost/benefit?)
  • "... offers users with an illusion of a single and large FPGA, in which they can develop, deploy, and execute applications at large-scale with ease to achieve energy-efficient HPC"
  • Does the benefit only come if the data you need to process happens to be spread out over many nodes?
    • Motivates distributing data as evenly as possible over storage.
    • Energy used to extract data from disk? SSD not a problem. Could be increased energy though if you need to spin up two HDDs as opposed to one.

Diana:

Michael:

  • The Ideal Environment
    • Single Major Function Kernel Build
    • Plug and Play Deployment
    • Multiple Deployments on Single FPGA
    • Multiple Deployments on Heterogeneous FPGAs
    • Transparent Performance Scaling
  • Partial Configuration (PR)
    • Hot Insertion of FPGA Region
    • Currently done via Processor Config Access Port (PCAP)
    • Design Modularization
  • SOTA Issues
    • Each Deployment Requires Separate Kernel
    • Tools Do Not Abstract FPGA I/O Heterogeneity
    • All Kernels Require Distinct I/O Configuration
    • Substantial O/H to swap kernels
    • Remote Swaps Have Higher Latency
  • Solution
    • I/O Virtualiztion
    • High Speed ICAP Dynamic Remote Config Service
    • High Level Synthesis
    • UNIMEM: effects PGAS
    • UNILOGIC: Virtual FPGA (VF) PR
  • UNILOGIC
    • Remote PR
    • VF task location
    • Visible Memory Mapped Accelerators (Kernels)
    • Transparent Accelerator Access to RDMA
  • FPGA Implementation Stack
    • Decoupled Accelerator Builds
    • H/W Abstractions
    • Accelerator I/F Libs:
      • Standardized Register Map
      • Generic Drivers: Streaming I/F, Master/Slave I/F
      • Runtime/Execution API: Hi Level S/W Integration
      • gRPC: async cluster job launch
  • Performance/Payoff
    • ICAP vs. PCAP: Table I
    • Inter PR comm latency (I/O Virt) 2-3 ns (Fig 3)
    • Build Flow: 55/336 mins (1/6)
    • Execution: Fig 4
    • Energy: Fig 4