Difference between revisions of "JIRIAF Meeting Feb. 8 2024"
Jump to navigation
Jump to search
Line 69: | Line 69: | ||
*** Check the list of not yet scheduled jobs and decide if we need to run more JRMs | *** Check the list of not yet scheduled jobs and decide if we need to run more JRMs | ||
*** Remove JRM if no suitable jobs can run on it. | *** Remove JRM if no suitable jobs can run on it. | ||
+ | *** Bayesian network-based agent model for a site/workflow. | ||
+ | **** Queueing theory-based mathematical model for predicting wait time for a streaming event in a queue before processing. | ||
* '''JRM''' | * '''JRM''' | ||
** Implement a function using ConfigMap configuration to write files in pods. | ** Implement a function using ConfigMap configuration to write files in pods. |
Revision as of 16:23, 7 February 2024
Connection Info:
Expand
You can connect using the following link (Meeting ID: 160 126 6529). (Click "Expand" to the right for details -->):
Agenda:
- Announcements
- Patrick's JLAB account.
- ORNL accounts are ready.
- Abstract accepted at the ACAT conference
- Presentation at NERSC Data Day
- Start preparing a paper for NIM
- Objectives as a groundwork for IRI
- Introduction to streaming workflows for HEP and NM
- Advantages in terms of workflow deployment, migration, and orchestration
- Mentioning the use of EJFAT as a transport mechanism that might be a key component for the future IRI
- Description of two JLAB data processing workflows: CLAS12 and GlueX
- K8s as workflow orchestration and monitoring tool
- Novelty: Building a dynamic and elastic k8s cluster without having fixed computing resources.
- Running k8s nodes in user space without additional configuration and/or setup requirements from resource providers.
- Deploying pods through shell commands.
- Concept validation experiment: JLAB-ESnet-NERSC data-stream processing
- Conclusions
- JFE
- The OIDC identity layer on top of the OAuth 2.0 protocol is necessary for JIRIAF users to join the CILogon federated identity management ecosystem, allowing users to access services using their existing institutional credentials without needing separate usernames and passwords. JIRIAF application OIDC registration.
- CILogon token structure.
- Workflow description metadata: Processing type (batch, streaming, opportunistic-streaming, etc.), Docker image location, Resource requirements (core type, core count, memory, disk, time, data provisioning details).
- pod.yaml, metric-server.yaml and VK/JRM startup scripts
- Database tables and visualization.
- Argo: workflow and visualization engine
- Argo workflow: If the wall time for the JRM is about to run out, move the pending pod request to another JRM to continue processing.
- Argo: workflow and visualization engine
- JCS and JMS
- time, cpu, and memory requests to steer deployment.
- Allocate a node suitable for running the specified job.
- What would be the time request for the JRM within the SLURM request?
- Job request queue
- List of pending and active JRMs
- Check the list of not yet scheduled jobs and decide if we need to run more JRMs
- Remove JRM if no suitable jobs can run on it.
- Bayesian network-based agent model for a site/workflow.
- Queueing theory-based mathematical model for predicting wait time for a streaming event in a queue before processing.
- Allocate a node suitable for running the specified job.
- time, cpu, and memory requests to steer deployment.
- JRM
- Implement a function using ConfigMap configuration to write files in pods.
- Define mechanisms to act on user workflows, such as reducing previously allocated resources to the user workflow/application.
- Documentation and code
- Centralize the code base in Github.
- Repository for all scripts and k8s YAML configuration files
- HowTo manual/instructions for setting up JIRIAF on jiriaf2301-02
- Centralize the code base in Github.
- AOT
Useful References