Difference between revisions of "JIRIAF Meeting Jan. 11 2024"

From epsciwiki
Jump to navigation Jump to search
 
(4 intermediate revisions by the same user not shown)
Line 36: Line 36:
  
 
=== Agenda: ===
 
=== Agenda: ===
# Announcements   
+
* Announcements   
## Welcome Patrick onboard.
+
** Welcome Patrick onboard.
## Problem getting an NSLS-II data-intensive workflow for migration.
+
** Problem getting an NSLS-II data-intensive workflow for migration.
## New sites for deployments: ORNL and ANL.
+
** New sites for deployments: ORNL and ANL.
### Ticket (INC0114103) requesting Jiriaf nodes to access DOE computing facilities like NERSC and ORNL.
+
** Ticket (INC0114103) requesting Jiriaf nodes to access DOE computing facilities like NERSC and ORNL.
#### The list of IP addresses and ports to present to the security and networking team.  
+
*** The list of IP addresses and ports to present to the security and networking team.  
# Summary of the project's undertakings and key achievements
+
** NERSC allocation
## '''M3'''
+
*** Jiriaf's project request has been approved (as of 01.02).
### Define mechanisms to act on user workflows, such as reducing previously allocated resources to the user workflow/application.
+
*** Nick Taylor assigned more time for the m3792 project (EJFAT-EsNet).
## '''M4'''
+
* Summary of the project's undertakings  
### JCS design and development
+
** JFE
#### Starting VKs (Jiriaf nodes) through the k8s API management system
+
*** Forms to submit user workflow requests
##### Jiriaf node naming convention and labeling
+
**** Login and authentication, Processing type (batch, streaming, opportunistic-streaming, etc.), Docker image location, Resource requirements (core type, core count, memory, disk, time, data provisioning details).
#### Jiriaf k8s cluster autoscaling (with possible AI support)
+
***** Research if k8s provides facilities for this (e.g., k8s dashboard)
##### Defining workflows/pods in the cluster that are unschedulable
+
** Visualize Jiriaf database tables.
#### JCS and Jiriaf database relationship. Tables, such as
+
*** Dynamic updates.
##### ''available resource'', ''user requests'', and ''user workflow status''.
+
** JCS and JMS
##### Examine the ''site resources'' database table (constantly updated by SWIF2) and submit SWIF2 requests to launch nodes and allocate/lease resources.
+
*** Starting VKs (Jiriaf nodes) through the k8s API management system
##### Communicate with the k8s App server, ensuring submitted jobs are running, updating JIRIAF's ''available resource'' DB table.
+
*** Jiriaf node naming convention and labeling
##### Develop a resource-request matching algorithm that compares user requests with the available resources.
+
*** JCS and Jiriaf database relationship. Tables, such as
##### Define and suggest metadata structure for requests for accurate matching.
+
**** ''available resource'', ''user requests'', and ''user workflow status''.
 
+
**** Examine the ''site resources'' database table (constantly updated by SWIF2) and submit SWIF2 requests to launch nodes and allocate/lease resources.
## '''M5'''
+
**** Jiriaf k8s cluster autoscaling (with possible AI support)
### JIRIAF k8s node/vk_cmd: Implement a function that can use ConfigMap configuration to write files in pods.
+
*** Defining workflows/pods in the cluster that are unschedulable
#### Anatomy of the node/vk launch script.
+
*** Communicate with the k8s App server, ensuring submitted jobs are running, updating JIRIAF's ''available resource'' DB table.
### VK hardware monitor server
+
*** Develop a resource-request matching algorithm that compares user requests with the available resources.
## Future milestones
+
*** Define and suggest metadata structure for requests for accurate matching.  
### Accepting opportunistic workflow primarily designed for streaming purposes.
+
** JRM
### Mathematical model for simulating the abstract processor/actor within the JIRIAF ecosystem.
+
*** Implement a function using ConfigMap configuration to write files in pods.
### Definition of the parameters and functionalities of the distributed workflow agent model and initiation of its design.
+
*** Define mechanisms to act on user workflows, such as reducing previously allocated resources to the user workflow/application.
# Slides for upcoming presentations in preparation for the publications
+
*** VK hardware monitor server
## Slide describing JIRIAF virtual k8s cluster creation, emphasizing its dynamic nature.
+
*** Anatomy of the node/vk launch script.
## Slide showing Prometheus integration to monitor JIRIAF k8s cluster and pods.
+
* Preparation for the publication.   
## Start working on a paper describing JIRIAF resource acquisition and workflow deployment within a dynamic k8s cluster.   
+
* AOT
# AOT
 
 
==== Useful References ====
 
==== Useful References ====
 
<small>
 
<small>

Latest revision as of 15:21, 11 January 2024


Connection Info:

You can connect using the following link (Meeting ID: 160 126 6529). (Click "Expand" to the right for details -->):

One tap mobile: US: +16692545252,,1608518798# or +16468287666,,1608518798#
Meeting URL: https://jlab-org.zoomgov.com/j/1601266529?pwd=ZkZKL0tjeWFpbmxDeWZob0VmbzNOUT09&from=addon
Meeting ID: 160 126 6529
Passcode: 292304

Join by Telephone
For higher quality, dial a number based on your current location.
Dial:
US: +1 669 254 5252 or +1 646 828 7666 or +1 551 285 1373 or +1 669 216 1590 or 833 568 8864 (Toll Free)
Meeting ID: 160 126 6529

International numbers
Join by SIP
1616903130@sip.zoomgov.com
Join by H.323
161.199.138.10 (US West)
161.199.136.10 (US East)
Meeting ID: 160 851 8798
Passcode: 292304


Agenda:

  • Announcements
    • Welcome Patrick onboard.
    • Problem getting an NSLS-II data-intensive workflow for migration.
    • New sites for deployments: ORNL and ANL.
    • Ticket (INC0114103) requesting Jiriaf nodes to access DOE computing facilities like NERSC and ORNL.
      • The list of IP addresses and ports to present to the security and networking team.
    • NERSC allocation
      • Jiriaf's project request has been approved (as of 01.02).
      • Nick Taylor assigned more time for the m3792 project (EJFAT-EsNet).
  • Summary of the project's undertakings
    • JFE
      • Forms to submit user workflow requests
        • Login and authentication, Processing type (batch, streaming, opportunistic-streaming, etc.), Docker image location, Resource requirements (core type, core count, memory, disk, time, data provisioning details).
          • Research if k8s provides facilities for this (e.g., k8s dashboard)
    • Visualize Jiriaf database tables.
      • Dynamic updates.
    • JCS and JMS
      • Starting VKs (Jiriaf nodes) through the k8s API management system
      • Jiriaf node naming convention and labeling
      • JCS and Jiriaf database relationship. Tables, such as
        • available resource, user requests, and user workflow status.
        • Examine the site resources database table (constantly updated by SWIF2) and submit SWIF2 requests to launch nodes and allocate/lease resources.
        • Jiriaf k8s cluster autoscaling (with possible AI support)
      • Defining workflows/pods in the cluster that are unschedulable
      • Communicate with the k8s App server, ensuring submitted jobs are running, updating JIRIAF's available resource DB table.
      • Develop a resource-request matching algorithm that compares user requests with the available resources.
      • Define and suggest metadata structure for requests for accurate matching.
    • JRM
      • Implement a function using ConfigMap configuration to write files in pods.
      • Define mechanisms to act on user workflows, such as reducing previously allocated resources to the user workflow/application.
      • VK hardware monitor server
      • Anatomy of the node/vk launch script.
  • Preparation for the publication.
  • AOT

Useful References



Minutes: