Difference between revisions of "JIRIAF Meeting Mar. 21 2024"

From epsciwiki
Jump to navigation Jump to search
(Created page with " === Connection Info: === <div class="toccolours mw-collapsible mw-collapsed"> You can connect using [https://jlab-org.zoomgov.com/j/1601266529?pwd=ZkZKL0tjeWFpbmxDeWZob0Vmbz...")
 
 
(13 intermediate revisions by the same user not shown)
Line 38: Line 38:
 
* '''Announcements'''   
 
* '''Announcements'''   
 
** Poster presentation at the [https://indico.cern.ch/event/1330797/abstracts/ ACAT conference]
 
** Poster presentation at the [https://indico.cern.ch/event/1330797/abstracts/ ACAT conference]
** Poster "HPC, HTC, and Cloud: converging towards seamless computing federation with InterLink" that also use [https://github.com/CARV-ICS-ICS-FORTH/knoc/tree/master VK]
+
** Poster "HPC, HTC, and Cloud: converging towards seamless computing federation with InterLink" that also use [https://github.com/CARV-ICS-FORTH/knoc VK]
 
* '''JFE'''
 
* '''JFE'''
 
** Code [https://github.com/JeffersonLab/jiriaf-0.1/tree/main/JFE base]
 
** Code [https://github.com/JeffersonLab/jiriaf-0.1/tree/main/JFE base]
Line 67: Line 67:
 
** Centralize the code base in [https://github.com/JeffersonLab/jiriaf-0.1 Github].
 
** Centralize the code base in [https://github.com/JeffersonLab/jiriaf-0.1 Github].
 
* ''' Q2 Milestone '''
 
* ''' Q2 Milestone '''
** Install FE, databases, and k8s: API server, Control manager, scheduler, and ETCD on jiriaf2301  
+
** <span style="color:#7B1818">Install FE, databases, and k8s: API server, Control manager, scheduler, and ETCD on jiriaf2301</span>
** Login into JIRIAF
+
** <span style="color:#7B1818">Login into JIRIAF</span>
** Submit CLAS12 reconstruction request (streaming workflow) and related sync workflow  
+
** <span style="color:#7B1818">Submit CLAS12 reconstruction request (streaming workflow) and related sync workflow</span>
** Acquire resources at NERSC and run JRM on all requested servers (for processor workflow)
+
** <span style="color:#7B1818">Acquire resources at NERSC and run JRM on all requested servers (for processor workflow)</span>
** Acquire persistency resource: ejfat-fs and run JRM (for sync workflow)
+
** <span style="color:#7B1818">Acquire persistency resource: ejfat-fs and run JRM (for sync workflow)</span>
* '''Start preparing a paper''' for NIM (e.g.)
+
* Start preparing a paper for NIM (e.g.)
* '''Start working on CHEP24 abstract and presentation
+
* Start working on CHEP24 abstract and presentation
 
  * AOT
 
  * AOT
 
==== Useful References ====
 
==== Useful References ====

Latest revision as of 14:49, 21 March 2024


Connection Info:

You can connect using the following link (Meeting ID: 160 126 6529). (Click "Expand" to the right for details -->):

One tap mobile: US: +16692545252,,1608518798# or +16468287666,,1608518798#
Meeting URL: https://jlab-org.zoomgov.com/j/1601266529?pwd=ZkZKL0tjeWFpbmxDeWZob0VmbzNOUT09&from=addon
Meeting ID: 160 126 6529
Passcode: 292304

Join by Telephone
For higher quality, dial a number based on your current location.
Dial:
US: +1 669 254 5252 or +1 646 828 7666 or +1 551 285 1373 or +1 669 216 1590 or 833 568 8864 (Toll Free)
Meeting ID: 160 126 6529

International numbers
Join by SIP
1616903130@sip.zoomgov.com
Join by H.323
161.199.138.10 (US West)
161.199.136.10 (US East)
Meeting ID: 160 851 8798
Passcode: 292304


Agenda:

  • Announcements
    • Poster presentation at the ACAT conference
    • Poster "HPC, HTC, and Cloud: converging towards seamless computing federation with InterLink" that also use VK
  • JFE
    • Code base
      • Current status
        • CIlogon authentication, database, etc.
      • Deployment on jiriaf2301
      • Visualization
        • Job request queue
        • List of pending and active JRMs
  • JRM
    • Tables and their physical location.
    • Metric server
    • Horizontal autoscaling support
  • JCS and JMS
    • DB API (Chris)
    • Resource Acquisition
      • Time, cpu, and memory requests to steer deployment.
      • Allocate a node suitable for running the specified job.
        • What would be the time request for the JRM within the SLURM request?
      • Check the job request queue and decide if we need to run more JRMs
      • Remove JRM if no suitable jobs can run on it.
      • If the wall time for the JRM is about to run out, move the pending pod request to another JRM to continue processing.
  • ML and digital twin
    • ML model trained on the historical data to help with the resource acquisition
    • Bayesian network-based agent model for a site/workflow.
      • Queueing theory-based mathematical model for predicting wait time for a streaming event in a queue before processing.
  • Documentation and code
    • Centralize the code base in Github.
  • Q2 Milestone
    • Install FE, databases, and k8s: API server, Control manager, scheduler, and ETCD on jiriaf2301
    • Login into JIRIAF
    • Submit CLAS12 reconstruction request (streaming workflow) and related sync workflow
    • Acquire resources at NERSC and run JRM on all requested servers (for processor workflow)
    • Acquire persistency resource: ejfat-fs and run JRM (for sync workflow)
  • Start preparing a paper for NIM (e.g.)
  • Start working on CHEP24 abstract and presentation
* AOT

Useful References



Minutes: