Difference between revisions of "EJFAT EPSCI Meeting Aug. 21, 2024"

From epsciwiki
Jump to navigation Jump to search
(Created page with "The meeting time is 2:30pm. === Connection Info: === <div class="toccolours mw-collapsible mw-collapsed"> You can connect using [https://teams.microsoft.com/l/meetup-join/19%...")
 
 
(One intermediate revision by the same user not shown)
Line 41: Line 41:
 
### [https://docs.google.com/document/d/13VvyCMNJW3nIVZMgqOuPn3MBSLmfAl1zLkJAHw8fj04/edit?usp=drivesdk Test Plans - JLab, ESnet, NERSC:]
 
### [https://docs.google.com/document/d/13VvyCMNJW3nIVZMgqOuPn3MBSLmfAl1zLkJAHw8fj04/edit?usp=drivesdk Test Plans - JLab, ESnet, NERSC:]
 
### Prometheus Dashboards
 
### Prometheus Dashboards
## JLab FEG/SRO - will use interim UDP solution for event sync - '''Sparse Port Map Issue'''
+
## JLab FEG/SRO
## E2SAR - e2sar/ibaldin:0.1.0a6 available - segmentation done, reassembly not completed yet - '''Port Map Issue'''
+
### will use interim UDP solution for event sync
 +
### '''Sparse Port Map Issue'''
 +
### '''Network Fabric Entropy'''
 +
## E2SAR
 +
### e2sar/ibaldin:0.1.0a6 available
 +
### segmentation done, reassembly not completed yet
 +
### '''Port Map Issue'''
 
## IB
 
## IB
## ejfat-3 - needs configuration corrections for networking, routing, mounting /daqfs - '''should't be used until further notice'''
+
## ejfat-3- '''should't be used until further notice'''
 +
### needs configuration corrections for networking, routing, mounting /daqfs
 +
### Hosts 2 U280 FPGAs
 
## [https://www.overleaf.com/project/667d9fa6b50f340b46026ba3 ACAT 2024 Paper] '''can fold in improvements with reviewer comments'''
 
## [https://www.overleaf.com/project/667d9fa6b50f340b46026ba3 ACAT 2024 Paper] '''can fold in improvements with reviewer comments'''
 
## <s>Awaiting TLS cert install across cluster</s>
 
## <s>Awaiting TLS cert install across cluster</s>

Latest revision as of 17:58, 21 August 2024

The meeting time is 2:30pm.

Connection Info:


Agenda:

  1. Previous meeting
  2. Announcements:
    1. ejfat-1 LB up and should behave the same as ESnet Stable LB
    2. ejfat-2 LB up and should behave the same as ESnet Stable LB
  3. Topics
    1. IRI Test Development:
      1. LB version = ESnet Stable version
      2. Data Source:
        1. JLAB, CLAS12, pre-triggered events - 1 channel
      3. Data Sink:
        1. Perlmutter - 40 nodes
        2. ORNL/ESnet/JLab IRI Testbed / Defiant - 4 nodes allocated
        3. JLab - 7 nodes available
        4. ERSAP
      4. Test Plans - JLab, ESnet, NERSC:
      5. Prometheus Dashboards
    2. JLab FEG/SRO
      1. will use interim UDP solution for event sync
      2. Sparse Port Map Issue
      3. Network Fabric Entropy
    3. E2SAR
      1. e2sar/ibaldin:0.1.0a6 available
      2. segmentation done, reassembly not completed yet
      3. Port Map Issue
    4. IB
    5. ejfat-3- should't be used until further notice
      1. needs configuration corrections for networking, routing, mounting /daqfs
      2. Hosts 2 U280 FPGAs
    6. ACAT 2024 Paper can fold in improvements with reviewer comments
    7. Awaiting TLS cert install across cluster
    8. Turn machines over to sys admin/Puppet - except for ejfat-fs
    9. SC poster submitted - demo in works
    10. SSD drives on ejfat-fs - 20TB used of 28TB - mount for EJFAT farm access probably ZFS/Raid for data preservation
    11. Ram Disks: 1TB Total Mem on ejfat-fs, 0.5 TB others
    12. Lustre storage for EJFAT - not available for Testbed or Ubuntu 20.04
    13. Experiment Halls - beam returns late January
  4. Resources:
    1. HPDF
    2. EJFAT API
    3. EJFAT Status
  5. AOT