Difference between revisions of "JIRIAF Meeting Jan. 25 2024"

From epsciwiki
Jump to navigation Jump to search
(Created page with " === Connection Info: === <div class="toccolours mw-collapsible mw-collapsed"> You can connect using [https://jlab-org.zoomgov.com/j/1601266529?pwd=ZkZKL0tjeWFpbmxDeWZob0Vmbz...")
 
 
(4 intermediate revisions by the same user not shown)
Line 40: Line 40:
 
** Tomorrow (01.26.24) starts biweekly meetings with ORNL (EJFAT collaboration)
 
** Tomorrow (01.26.24) starts biweekly meetings with ORNL (EJFAT collaboration)
 
* "Direct" access to Perlmutter cluster login nodes from jiriaf2301-02   
 
* "Direct" access to Perlmutter cluster login nodes from jiriaf2301-02   
* Access granted to the ESnet Perlmutter server.
+
* Access granted to the ESnet Perlmutter server from jiriaf2301-02.
 
** curl -s -G --data-urlencode 'match[]={__name__=~".+"}' https://prometheus-ejfat.es.net/federate
 
** curl -s -G --data-urlencode 'match[]={__name__=~".+"}' https://prometheus-ejfat.es.net/federate
 
* NERSC allocation
 
* NERSC allocation
 
** 250-hour allocation on Perlmutter: m4636 project (JIRIAF)
 
** 250-hour allocation on Perlmutter: m4636 project (JIRIAF)
 
** 300-hour allocation on Perlmutter: m3792 project (EJFAT-EsNet).
 
** 300-hour allocation on Perlmutter: m3792 project (EJFAT-EsNet).
* Abstract submitted to [ https://indico.cern.ch/event/1330797/ ACAT]
+
* Abstract submitted to [https://indico.cern.ch/event/1330797/ ACAT]
* Invitation from Derek to give a talk on JIRIAF at the [https://www.nersc.gov/users/training/events/2024/nersc-data-day-feb-21-22-2024/ NERSC Data Day]
+
* Invitation from Nick to give a talk on JIRIAF at the [https://www.nersc.gov/users/training/events/2024/nersc-data-day-feb-21-22-2024/ NERSC Data Day]
 
* Summary of the project's undertakings  
 
* Summary of the project's undertakings  
 
** JFE
 
** JFE
 
*** Forms to submit user workflow requests
 
*** Forms to submit user workflow requests
**** Login and authentication, Processing type (batch, streaming, opportunistic-streaming, etc.), Docker image location, Resource requirements (core type, core count, memory, disk, time, data provisioning details).
+
**** Login and authentication
 +
**** Workflow description: Processing type (batch, streaming, opportunistic-streaming, etc.), Docker image location, Resource requirements (core type, core count, memory, disk, time, data provisioning details).
 
***** Research if k8s provides facilities for this (e.g., k8s dashboard)  
 
***** Research if k8s provides facilities for this (e.g., k8s dashboard)  
 
** Visualize Jiriaf database tables.
 
** Visualize Jiriaf database tables.
Line 70: Line 71:
 
*** Define mechanisms to act on user workflows, such as reducing previously allocated resources to the user workflow/application.
 
*** Define mechanisms to act on user workflows, such as reducing previously allocated resources to the user workflow/application.
 
*** VK hardware monitor server
 
*** VK hardware monitor server
*** node/vk launch script.   
+
*** node/vk launch script.
*** Repository for all such scripts and k8s YAML configuration files
+
*** HowTo manual/instructions for setting up JIRIAF on jiriaf2301-02
 +
**** Required env variables (KUBECONFIG, VKUBELET_POD_IP)  
 +
**** Repository for all such scripts and k8s YAML configuration files
 
* AOT
 
* AOT
 
==== Useful References ====
 
==== Useful References ====

Latest revision as of 18:41, 25 January 2024


Connection Info:

You can connect using the following link (Meeting ID: 160 126 6529). (Click "Expand" to the right for details -->):

One tap mobile: US: +16692545252,,1608518798# or +16468287666,,1608518798#
Meeting URL: https://jlab-org.zoomgov.com/j/1601266529?pwd=ZkZKL0tjeWFpbmxDeWZob0VmbzNOUT09&from=addon
Meeting ID: 160 126 6529
Passcode: 292304

Join by Telephone
For higher quality, dial a number based on your current location.
Dial:
US: +1 669 254 5252 or +1 646 828 7666 or +1 551 285 1373 or +1 669 216 1590 or 833 568 8864 (Toll Free)
Meeting ID: 160 126 6529

International numbers
Join by SIP
1616903130@sip.zoomgov.com
Join by H.323
161.199.138.10 (US West)
161.199.136.10 (US East)
Meeting ID: 160 851 8798
Passcode: 292304


Agenda:

  • Announcements
  • New site for deployments: ORNL.
    • Tomorrow (01.26.24) starts biweekly meetings with ORNL (EJFAT collaboration)
  • "Direct" access to Perlmutter cluster login nodes from jiriaf2301-02
  • Access granted to the ESnet Perlmutter server from jiriaf2301-02.
  • NERSC allocation
    • 250-hour allocation on Perlmutter: m4636 project (JIRIAF)
    • 300-hour allocation on Perlmutter: m3792 project (EJFAT-EsNet).
  • Abstract submitted to ACAT
  • Invitation from Nick to give a talk on JIRIAF at the NERSC Data Day
  • Summary of the project's undertakings
    • JFE
      • Forms to submit user workflow requests
        • Login and authentication
        • Workflow description: Processing type (batch, streaming, opportunistic-streaming, etc.), Docker image location, Resource requirements (core type, core count, memory, disk, time, data provisioning details).
          • Research if k8s provides facilities for this (e.g., k8s dashboard)
    • Visualize Jiriaf database tables.
      • Dynamic updates.
    • JCS and JMS
      • Starting VKs (Jiriaf nodes) through the k8s API management system
      • Jiriaf node naming convention and labeling
      • JCS and Jiriaf database relationship. Tables, such as
        • available resource, user requests, and user workflow status.
        • Examine the site resources database table (constantly updated by SWIF2) and submit SWIF2 requests to launch nodes and allocate/lease resources.
        • JIRIAF k8s cluster autoscaling (with possible AI support)
          • Estimate wait time in the queue before and during the processing (Queueing theory)
      • Defining workflows/pods in the cluster that are unschedulable
      • Communicate with the k8s App server, ensuring submitted jobs are running, updating JIRIAF's available resource DB table.
      • Develop a resource-request matching algorithm that compares user requests with the available resources.
      • Define and suggest metadata structure for requests for accurate matching.
    • JRM
      • Implement a function using ConfigMap configuration to write files in pods.
      • Define mechanisms to act on user workflows, such as reducing previously allocated resources to the user workflow/application.
      • VK hardware monitor server
      • node/vk launch script.
      • HowTo manual/instructions for setting up JIRIAF on jiriaf2301-02
        • Required env variables (KUBECONFIG, VKUBELET_POD_IP)
        • Repository for all such scripts and k8s YAML configuration files
  • AOT

Useful References



Minutes: