Difference between revisions of "EJFAT EPSCI Meeting Sep. 18, 2024"

From epsciwiki
Jump to navigation Jump to search
 
(8 intermediate revisions by the same user not shown)
Line 30: Line 30:
 
## [https://www.overleaf.com/project/6412236165900e77e3479a46 ACAT 2022 Paper] - accepted
 
## [https://www.overleaf.com/project/6412236165900e77e3479a46 ACAT 2022 Paper] - accepted
 
## '''Scicomp Test Routers installation 09/18/2024'''
 
## '''Scicomp Test Routers installation 09/18/2024'''
 +
## U280’s are discontinued. 
 +
### New LB purchases U55C
 +
### U55C bitfiles available  1 year out
 +
### U280 Supported indefinitely
 
# [https://wiki.jlab.org/epsciwiki/index.php/EJFAT Status]
 
# [https://wiki.jlab.org/epsciwiki/index.php/EJFAT Status]
 
# Topics
 
# Topics
Line 44: Line 48:
 
### Prometheus Dashboards
 
### Prometheus Dashboards
 
### Analyzing data from Aug 29, 2024
 
### Analyzing data from Aug 29, 2024
 +
### The Prometheus dashboard can be accessed on port 1717 of the ejfat-fs node. The test data is located at "100g-nersc-ornl / ejfat-nersc-ornl". The test time interval is around UTC 17:05 to 18:20 on August 29, 2024. To log in to Grafana, please use the username and password "ejfat".
 
## JLab FEG/SRO
 
## JLab FEG/SRO
 
### will use interim UDP solution for event sync
 
### will use interim UDP solution for event sync
 
### '''Special Events Issue'''
 
### '''Special Events Issue'''
 
#### [https://jeffersonlab.sharepoint.com/:i:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/EJFAT-SE-OOB.drawio.png?csf=1&web=1&e=56F0Ep Completely Out-of-band]
 
#### [https://jeffersonlab.sharepoint.com/:i:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/EJFAT-SE-OOB.drawio.png?csf=1&web=1&e=56F0Ep Completely Out-of-band]
#### [https://jeffersonlab.sharepoint.com/:i:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/EJFAT-3LB.drawio.png?csf=1&web=1&e=S5fb29 Three LBs]
+
#### <s>[https://jeffersonlab.sharepoint.com/:i:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/EJFAT-3LB.drawio.png?csf=1&web=1&e=S5fb29 Three LBs]</s>
 
#### [https://jeffersonlab.sharepoint.com/:i:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/EJFAT-PT-OOB.drawio.png?csf=1&web=1&e=lMehhF Pass-through / Out-of-band]
 
#### [https://jeffersonlab.sharepoint.com/:i:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/EJFAT-PT-OOB.drawio.png?csf=1&web=1&e=lMehhF Pass-through / Out-of-band]
 
#### In-band - new Req for LB F/W
 
#### In-band - new Req for LB F/W
#### [https://jeffersonlab.sharepoint.com/:i:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/EJFAT-3LB-1.drawio.png?csf=1&web=1&e=0YJCar Pass-through / 2nd LBs]
+
#### [https://jeffersonlab.sharepoint.com/:i:/r/sites/SciComp/Shared%20Documents/EPSCI/EJFAT/EJFAT-3LB-1.drawio.png?csf=1&web=1&e=hJnt4k Pass-through / 3 LBs]
 
#### Delay Line Technique
 
#### Delay Line Technique
 
#### Extra Meta Data in Each Event
 
#### Extra Meta Data in Each Event
 
#### [https://docs.google.com/presentation/d/1oVm4TwCEuYz3wNd_Er1E06jUOE50g0uNDIl6HhnZK18/edit?usp=sharing LB-CODA]
 
#### [https://docs.google.com/presentation/d/1oVm4TwCEuYz3wNd_Er1E06jUOE50g0uNDIl6HhnZK18/edit?usp=sharing LB-CODA]
 +
#### [https://docs.google.com/presentation/d/1egpNE1DO4inMBg5KuNIF7GXNdcteMPNW68CYG86iQlc/edit?usp=sharing LB-CODA-CP]
 
## E2SAR
 
## E2SAR
 
### e2sar/ibaldin:0.1.0b1 available - MVP completed
 
### e2sar/ibaldin:0.1.0b1 available - MVP completed
 
### '''ejfat-5 reserved for E2SAR'''
 
### '''ejfat-5 reserved for E2SAR'''
 +
### E2SAR 0.1.0 .deb packages for Ubuntu 20, 22 and 24 are now available (they contain e2sar library, headers, executables as well as appropriate versions of gRPC and Boost dependencies, all installed under /usr/local), as well as the latest Docker image with E2SAR 0.1.0 - the 'officially' released MVP.
 
## IB
 
## IB
 
## ejfat-3 - ready for FPGA cluster LB install
 
## ejfat-3 - ready for FPGA cluster LB install
## ejfat-6 - Ubuntu 24.04 being installed - no docker
+
## ejfat-6 - Ubuntu 24.04 <s>being</s> installed - no docker
 
## [https://jeffersonlab-my.sharepoint.com/:b:/g/personal/baldin_jlab_org/EaFbLl6TB9BBkA7bG7SwxAwB1_V5HB2rcPNH9KKY846NkQ?e=T1m0q5 SC poster submitted - demo in works]
 
## [https://jeffersonlab-my.sharepoint.com/:b:/g/personal/baldin_jlab_org/EaFbLl6TB9BBkA7bG7SwxAwB1_V5HB2rcPNH9KKY846NkQ?e=T1m0q5 SC poster submitted - demo in works]
## SSD drives on ejfat-fs - 20TB used of 28TB - mount for EJFAT farm - '''pending'''
+
## SSD drives on ejfat-fs - 20TB used of 28TB - mount for EJFAT farm - <s>pending</s>
## Ram Disks: 1TB Total Mem on ejfat-fs, 0.5 TB others - '''pending'''
+
## Ram Disks: 1TB Total Mem on ejfat-fs, 0.5 TB others - <s>pending</s>
 +
## Disk sdb - user storage or HA OS fail-over mirror?
 
## Experiment Halls - beam returns late January/February 2025
 
## Experiment Halls - beam returns late January/February 2025
 
## Ubuntu 20.04 LTS - support ends in 2025
 
## Ubuntu 20.04 LTS - support ends in 2025
## LAGing U280 to switch on ejfat-1 - '''pending'''
+
## LAGing U280 to switch on ejfat-1 - <s>pending</s>
 
# Resources:
 
# Resources:
 
## [https://jeffersonlab.sharepoint.com/sites/HPDF HPDF]
 
## [https://jeffersonlab.sharepoint.com/sites/HPDF HPDF]

Latest revision as of 14:42, 26 September 2024

The meeting time is 2:30pm.

Connection Info:


Agenda:

  1. Previous meeting
  2. Announcements:
    1. ACAT 2024 Paper - In Review
    2. ACAT 2022 Paper - accepted
    3. Scicomp Test Routers installation 09/18/2024
    4. U280’s are discontinued.
      1. New LB purchases U55C
      2. U55C bitfiles available 1 year out
      3. U280 Supported indefinitely
  3. Status
  4. Topics
    1. IRI Test Development:
      1. LB version = ESnet Stable version
      2. Data Source:
        1. JLAB, CLAS12, pre-triggered events - 1 channel
      3. Data Sink:
        1. Perlmutter - 40 nodes
        2. ORNL/ESnet/JLab IRI Testbed / Defiant - 4 nodes allocated
        3. JLab - 7 nodes available
        4. ERSAP
      4. Test Plans - JLab, ESnet, NERSC:
      5. Prometheus Dashboards
      6. Analyzing data from Aug 29, 2024
      7. The Prometheus dashboard can be accessed on port 1717 of the ejfat-fs node. The test data is located at "100g-nersc-ornl / ejfat-nersc-ornl". The test time interval is around UTC 17:05 to 18:20 on August 29, 2024. To log in to Grafana, please use the username and password "ejfat".
    2. JLab FEG/SRO
      1. will use interim UDP solution for event sync
      2. Special Events Issue
        1. Completely Out-of-band
        2. Three LBs
        3. Pass-through / Out-of-band
        4. In-band - new Req for LB F/W
        5. Pass-through / 3 LBs
        6. Delay Line Technique
        7. Extra Meta Data in Each Event
        8. LB-CODA
        9. LB-CODA-CP
    3. E2SAR
      1. e2sar/ibaldin:0.1.0b1 available - MVP completed
      2. ejfat-5 reserved for E2SAR
      3. E2SAR 0.1.0 .deb packages for Ubuntu 20, 22 and 24 are now available (they contain e2sar library, headers, executables as well as appropriate versions of gRPC and Boost dependencies, all installed under /usr/local), as well as the latest Docker image with E2SAR 0.1.0 - the 'officially' released MVP.
    4. IB
    5. ejfat-3 - ready for FPGA cluster LB install
    6. ejfat-6 - Ubuntu 24.04 being installed - no docker
    7. SC poster submitted - demo in works
    8. SSD drives on ejfat-fs - 20TB used of 28TB - mount for EJFAT farm - pending
    9. Ram Disks: 1TB Total Mem on ejfat-fs, 0.5 TB others - pending
    10. Disk sdb - user storage or HA OS fail-over mirror?
    11. Experiment Halls - beam returns late January/February 2025
    12. Ubuntu 20.04 LTS - support ends in 2025
    13. LAGing U280 to switch on ejfat-1 - pending
  5. Resources:
    1. HPDF
    2. EJFAT API
    3. EJFAT Status
  6. AOT