JIRIAF Meeting Mar. 7 2024

From epsciwiki
Jump to navigation Jump to search


Connection Info:

You can connect using the following link (Meeting ID: 160 126 6529). (Click "Expand" to the right for details -->):

One tap mobile: US: +16692545252,,1608518798# or +16468287666,,1608518798#
Meeting URL: https://jlab-org.zoomgov.com/j/1601266529?pwd=ZkZKL0tjeWFpbmxDeWZob0VmbzNOUT09&from=addon
Meeting ID: 160 126 6529
Passcode: 292304

Join by Telephone
For higher quality, dial a number based on your current location.
Dial:
US: +1 669 254 5252 or +1 646 828 7666 or +1 551 285 1373 or +1 669 216 1590 or 833 568 8864 (Toll Free)
Meeting ID: 160 126 6529

International numbers
Join by SIP
1616903130@sip.zoomgov.com
Join by H.323
161.199.138.10 (US West)
161.199.136.10 (US East)
Meeting ID: 160 851 8798
Passcode: 292304


Agenda:

  • Announcements
  • JFE
    • The OIDC identity layer on top of the OAuth 2.0 protocol is necessary for JIRIAF users to join the CILogon federated identity management ecosystem, allowing users to access services using their existing institutional credentials without needing separate usernames and passwords. JIRIAF application OIDC registration.
    • CILogon token structure.
    • Workflow description metadata: Processing type (batch, streaming, opportunistic-streaming, etc.), Docker image location, Resource requirements (core type, core count, memory, disk, time, data provisioning details).
    • pod.yaml, metric-server.yaml and VK/JRM startup scripts
    • Database tables and visualization.
      • Workflow and visualization engine (Argo alternative)
  • JCS and JMS
    • DB API (Chris)
    • Resource acquisition (Vardan, Chris, Jeng)
      • Time, cpu, and memory requests to steer deployment.
      • Allocate a node suitable for running the specified job.
        • What would be the time request for the JRM within the SLURM request?
        • ML model trained on the historical data to help with the resource acquisition (Jeng)
    • Job request queue (Vardan, Jeng)
      • List of pending and active JRMs. Check the list of not yet scheduled jobs and decide if we need to run more JRMs
      • Remove JRM if no suitable jobs can run on it.
      • If the wall time for the JRM is about to run out, move the pending pod request to another JRM to continue processing.
    • Digital twin (Vardan)
      • Bayesian network-based agent model for a site/workflow.
        • Queueing theory-based mathematical model for predicting wait time for a streaming event in a queue before processing.
  • JRM
    • Current status
    • Metrics server
  • Documentation and code
    • Centralize the code base in Github.
      • JRM GitHub: is it a submodule or subtree?
        • Repository for all scripts and k8s YAML configuration files
    • HowTo manual/instructions for setting up JIRIAF on jiriaf2301-02
      • Documentation on the EJFAT wiki (Jeng, great job!)
  • Start preparing a paper for NIM (e.g.)
    • Objectives as a groundwork for IRI
    • Introduction to streaming workflows for HEP and NP
      • Advantages in terms of workflow deployment, migration, and orchestration
      • Mentioning the use of EJFAT as a transport mechanism that might be a key component for the future IRI
      • Description of two JLAB data processing workflows: CLAS12 and GlueX
    • K8s as workflow orchestration and monitoring tool
      • Novelty: Building a dynamic and elastic k8s cluster without having fixed computing resources.
      • Running k8s nodes in user space without additional configuration and/or setup requirements from resource providers.
      • Deploying pods through shell commands.
    • Concept validation experiment: JLAB-ESnet-NERSC data-stream processing
    • Conclusions
  • AOT

Useful References



Minutes: