Difference between revisions of "JIRIAF Meeting Jan. 25 2024"
Jump to navigation
Jump to search
(Created page with " === Connection Info: === <div class="toccolours mw-collapsible mw-collapsed"> You can connect using [https://jlab-org.zoomgov.com/j/1601266529?pwd=ZkZKL0tjeWFpbmxDeWZob0Vmbz...") |
|||
(4 intermediate revisions by the same user not shown) | |||
Line 40: | Line 40: | ||
** Tomorrow (01.26.24) starts biweekly meetings with ORNL (EJFAT collaboration) | ** Tomorrow (01.26.24) starts biweekly meetings with ORNL (EJFAT collaboration) | ||
* "Direct" access to Perlmutter cluster login nodes from jiriaf2301-02 | * "Direct" access to Perlmutter cluster login nodes from jiriaf2301-02 | ||
− | * Access granted to the ESnet Perlmutter server. | + | * Access granted to the ESnet Perlmutter server from jiriaf2301-02. |
** curl -s -G --data-urlencode 'match[]={__name__=~".+"}' https://prometheus-ejfat.es.net/federate | ** curl -s -G --data-urlencode 'match[]={__name__=~".+"}' https://prometheus-ejfat.es.net/federate | ||
* NERSC allocation | * NERSC allocation | ||
** 250-hour allocation on Perlmutter: m4636 project (JIRIAF) | ** 250-hour allocation on Perlmutter: m4636 project (JIRIAF) | ||
** 300-hour allocation on Perlmutter: m3792 project (EJFAT-EsNet). | ** 300-hour allocation on Perlmutter: m3792 project (EJFAT-EsNet). | ||
− | * Abstract submitted to [ https://indico.cern.ch/event/1330797/ ACAT] | + | * Abstract submitted to [https://indico.cern.ch/event/1330797/ ACAT] |
− | * Invitation from | + | * Invitation from Nick to give a talk on JIRIAF at the [https://www.nersc.gov/users/training/events/2024/nersc-data-day-feb-21-22-2024/ NERSC Data Day] |
* Summary of the project's undertakings | * Summary of the project's undertakings | ||
** JFE | ** JFE | ||
*** Forms to submit user workflow requests | *** Forms to submit user workflow requests | ||
− | **** Login and authentication | + | **** Login and authentication |
+ | **** Workflow description: Processing type (batch, streaming, opportunistic-streaming, etc.), Docker image location, Resource requirements (core type, core count, memory, disk, time, data provisioning details). | ||
***** Research if k8s provides facilities for this (e.g., k8s dashboard) | ***** Research if k8s provides facilities for this (e.g., k8s dashboard) | ||
** Visualize Jiriaf database tables. | ** Visualize Jiriaf database tables. | ||
Line 70: | Line 71: | ||
*** Define mechanisms to act on user workflows, such as reducing previously allocated resources to the user workflow/application. | *** Define mechanisms to act on user workflows, such as reducing previously allocated resources to the user workflow/application. | ||
*** VK hardware monitor server | *** VK hardware monitor server | ||
− | *** node/vk launch script. | + | *** node/vk launch script. |
− | *** Repository for all such scripts and k8s YAML configuration files | + | *** HowTo manual/instructions for setting up JIRIAF on jiriaf2301-02 |
+ | **** Required env variables (KUBECONFIG, VKUBELET_POD_IP) | ||
+ | **** Repository for all such scripts and k8s YAML configuration files | ||
* AOT | * AOT | ||
==== Useful References ==== | ==== Useful References ==== |
Latest revision as of 18:41, 25 January 2024
Connection Info:
You can connect using the following link (Meeting ID: 160 126 6529). (Click "Expand" to the right for details -->):
One tap mobile: US: +16692545252,,1608518798# or +16468287666,,1608518798#
Meeting URL: https://jlab-org.zoomgov.com/j/1601266529?pwd=ZkZKL0tjeWFpbmxDeWZob0VmbzNOUT09&from=addon
Meeting ID: 160 126 6529
Passcode: 292304
Join by Telephone
For higher quality, dial a number based on your current location.
Dial:
US: +1 669 254 5252 or +1 646 828 7666 or +1 551 285 1373 or +1 669 216 1590 or 833 568 8864 (Toll Free)
Meeting ID: 160 126 6529
International numbers
Join by SIP
1616903130@sip.zoomgov.com
Join by H.323
161.199.138.10 (US West)
161.199.136.10 (US East)
Meeting ID: 160 851 8798
Passcode: 292304
Agenda:
- Announcements
- New site for deployments: ORNL.
- Tomorrow (01.26.24) starts biweekly meetings with ORNL (EJFAT collaboration)
- "Direct" access to Perlmutter cluster login nodes from jiriaf2301-02
- Access granted to the ESnet Perlmutter server from jiriaf2301-02.
- curl -s -G --data-urlencode 'match[]={__name__=~".+"}' https://prometheus-ejfat.es.net/federate
- NERSC allocation
- 250-hour allocation on Perlmutter: m4636 project (JIRIAF)
- 300-hour allocation on Perlmutter: m3792 project (EJFAT-EsNet).
- Abstract submitted to ACAT
- Invitation from Nick to give a talk on JIRIAF at the NERSC Data Day
- Summary of the project's undertakings
- JFE
- Forms to submit user workflow requests
- Login and authentication
- Workflow description: Processing type (batch, streaming, opportunistic-streaming, etc.), Docker image location, Resource requirements (core type, core count, memory, disk, time, data provisioning details).
- Research if k8s provides facilities for this (e.g., k8s dashboard)
- Forms to submit user workflow requests
- Visualize Jiriaf database tables.
- Dynamic updates.
- JCS and JMS
- Starting VKs (Jiriaf nodes) through the k8s API management system
- Jiriaf node naming convention and labeling
- JCS and Jiriaf database relationship. Tables, such as
- available resource, user requests, and user workflow status.
- Examine the site resources database table (constantly updated by SWIF2) and submit SWIF2 requests to launch nodes and allocate/lease resources.
- JIRIAF k8s cluster autoscaling (with possible AI support)
- Estimate wait time in the queue before and during the processing (Queueing theory)
- Defining workflows/pods in the cluster that are unschedulable
- Communicate with the k8s App server, ensuring submitted jobs are running, updating JIRIAF's available resource DB table.
- Develop a resource-request matching algorithm that compares user requests with the available resources.
- Define and suggest metadata structure for requests for accurate matching.
- JRM
- Implement a function using ConfigMap configuration to write files in pods.
- Define mechanisms to act on user workflows, such as reducing previously allocated resources to the user workflow/application.
- VK hardware monitor server
- node/vk launch script.
- HowTo manual/instructions for setting up JIRIAF on jiriaf2301-02
- Required env variables (KUBECONFIG, VKUBELET_POD_IP)
- Repository for all such scripts and k8s YAML configuration files
- JFE
- AOT
Useful References