Difference between revisions of "JIRIAF Meeting Feb. 8 2024"

From epsciwiki
Jump to navigation Jump to search
 
(5 intermediate revisions by the same user not shown)
Line 36: Line 36:
  
 
=== Agenda: ===
 
=== Agenda: ===
* Announcements   
+
* '''Announcements'''  
 
** Patrick's JLAB account.
 
** Patrick's JLAB account.
 
** ORNL accounts are ready.
 
** ORNL accounts are ready.
 
** Abstract accepted at the [https://indico.cern.ch/event/1330797/abstracts/ ACAT conference]
 
** Abstract accepted at the [https://indico.cern.ch/event/1330797/abstracts/ ACAT conference]
 
** Presentation at [https://www.nersc.gov/users/training/events/2024/nersc-data-day-feb-21-22-2024/ NERSC Data Day]  
 
** Presentation at [https://www.nersc.gov/users/training/events/2024/nersc-data-day-feb-21-22-2024/ NERSC Data Day]  
* JFE
+
* '''JFE'''
 
** The OIDC identity layer on top of the OAuth 2.0 protocol is necessary for JIRIAF users to join the CILogon federated identity management ecosystem, allowing users to access services using their existing institutional credentials without needing separate usernames and passwords. JIRIAF application OIDC registration.
 
** The OIDC identity layer on top of the OAuth 2.0 protocol is necessary for JIRIAF users to join the CILogon federated identity management ecosystem, allowing users to access services using their existing institutional credentials without needing separate usernames and passwords. JIRIAF application OIDC registration.
 
** CILogon token structure.
 
** CILogon token structure.
Line 49: Line 49:
 
*** Argo: workflow and visualization engine
 
*** Argo: workflow and visualization engine
 
**** Argo workflow: If the wall time for the JRM is about to run out, move the pending pod request to another JRM to continue processing.
 
**** Argo workflow: If the wall time for the JRM is about to run out, move the pending pod request to another JRM to continue processing.
* JCS and JMS  
+
* '''JCS and JMS'''
 
** time, cpu, and memory requests to steer deployment.
 
** time, cpu, and memory requests to steer deployment.
 
*** Allocate a node suitable for running the specified job.
 
*** Allocate a node suitable for running the specified job.
Line 57: Line 57:
 
*** Check the list of not yet scheduled jobs and decide if we need to run more JRMs
 
*** Check the list of not yet scheduled jobs and decide if we need to run more JRMs
 
*** Remove JRM if no suitable jobs can run on it.
 
*** Remove JRM if no suitable jobs can run on it.
* JRM  
+
*** Bayesian network-based agent model for a site/workflow.
 +
**** Queueing theory-based mathematical model for predicting wait time for a streaming event in a queue before processing.
 +
* '''JRM'''
 
** Implement a function using ConfigMap configuration to write files in pods.
 
** Implement a function using ConfigMap configuration to write files in pods.
 
** Define mechanisms to act on user workflows, such as reducing previously allocated resources to the user workflow/application.
 
** Define mechanisms to act on user workflows, such as reducing previously allocated resources to the user workflow/application.
* Documentation and code
+
* '''Documentation and code'''
** Centralize the code base in Github.
+
** Centralize the code base in [https://github.com/JeffersonLab/jiriaf-0.1 Github].
 
*** Repository for all scripts and k8s YAML configuration files
 
*** Repository for all scripts and k8s YAML configuration files
 
** HowTo manual/instructions for setting up JIRIAF on jiriaf2301-02  
 
** HowTo manual/instructions for setting up JIRIAF on jiriaf2301-02  
 +
* '''Start preparing a paper''' for NIM (e.g.)
 +
** ''Objectives as a groundwork for IRI''
 +
** ''Introduction to streaming workflows for HEP and NP''
 +
*** Advantages in terms of workflow deployment, migration, and orchestration
 +
*** Mentioning the use of EJFAT as a transport mechanism that might be a key component for the future IRI
 +
*** Description of two JLAB data processing workflows: CLAS12 and GlueX
 +
** ''K8s as workflow orchestration and monitoring tool''
 +
*** Novelty: Building a dynamic and elastic k8s cluster without having fixed computing resources.
 +
*** Running k8s nodes in user space without additional configuration and/or setup requirements from resource providers.
 +
*** Deploying pods through shell commands.
 +
** ''Concept validation experiment: JLAB-ESnet-NERSC data-stream processing''
 +
** ''Conclusions''
 
* AOT
 
* AOT
 
==== Useful References ====
 
==== Useful References ====

Latest revision as of 18:47, 8 February 2024


Connection Info:

You can connect using the following link (Meeting ID: 160 126 6529). (Click "Expand" to the right for details -->):

One tap mobile: US: +16692545252,,1608518798# or +16468287666,,1608518798#
Meeting URL: https://jlab-org.zoomgov.com/j/1601266529?pwd=ZkZKL0tjeWFpbmxDeWZob0VmbzNOUT09&from=addon
Meeting ID: 160 126 6529
Passcode: 292304

Join by Telephone
For higher quality, dial a number based on your current location.
Dial:
US: +1 669 254 5252 or +1 646 828 7666 or +1 551 285 1373 or +1 669 216 1590 or 833 568 8864 (Toll Free)
Meeting ID: 160 126 6529

International numbers
Join by SIP
1616903130@sip.zoomgov.com
Join by H.323
161.199.138.10 (US West)
161.199.136.10 (US East)
Meeting ID: 160 851 8798
Passcode: 292304


Agenda:

  • Announcements
  • JFE
    • The OIDC identity layer on top of the OAuth 2.0 protocol is necessary for JIRIAF users to join the CILogon federated identity management ecosystem, allowing users to access services using their existing institutional credentials without needing separate usernames and passwords. JIRIAF application OIDC registration.
    • CILogon token structure.
    • Workflow description metadata: Processing type (batch, streaming, opportunistic-streaming, etc.), Docker image location, Resource requirements (core type, core count, memory, disk, time, data provisioning details).
    • pod.yaml, metric-server.yaml and VK/JRM startup scripts
    • Database tables and visualization.
      • Argo: workflow and visualization engine
        • Argo workflow: If the wall time for the JRM is about to run out, move the pending pod request to another JRM to continue processing.
  • JCS and JMS
    • time, cpu, and memory requests to steer deployment.
      • Allocate a node suitable for running the specified job.
        • What would be the time request for the JRM within the SLURM request?
      • Job request queue
      • List of pending and active JRMs
      • Check the list of not yet scheduled jobs and decide if we need to run more JRMs
      • Remove JRM if no suitable jobs can run on it.
      • Bayesian network-based agent model for a site/workflow.
        • Queueing theory-based mathematical model for predicting wait time for a streaming event in a queue before processing.
  • JRM
    • Implement a function using ConfigMap configuration to write files in pods.
    • Define mechanisms to act on user workflows, such as reducing previously allocated resources to the user workflow/application.
  • Documentation and code
    • Centralize the code base in Github.
      • Repository for all scripts and k8s YAML configuration files
    • HowTo manual/instructions for setting up JIRIAF on jiriaf2301-02
  • Start preparing a paper for NIM (e.g.)
    • Objectives as a groundwork for IRI
    • Introduction to streaming workflows for HEP and NP
      • Advantages in terms of workflow deployment, migration, and orchestration
      • Mentioning the use of EJFAT as a transport mechanism that might be a key component for the future IRI
      • Description of two JLAB data processing workflows: CLAS12 and GlueX
    • K8s as workflow orchestration and monitoring tool
      • Novelty: Building a dynamic and elastic k8s cluster without having fixed computing resources.
      • Running k8s nodes in user space without additional configuration and/or setup requirements from resource providers.
      • Deploying pods through shell commands.
    • Concept validation experiment: JLAB-ESnet-NERSC data-stream processing
    • Conclusions
  • AOT

Useful References



Minutes: