Difference between revisions of "JIRIAF Meeting Jan. 11 2024"

From epsciwiki
Jump to navigation Jump to search
Line 45: Line 45:
 
### Jiriaf's project request has been approved (as of 01.02).
 
### Jiriaf's project request has been approved (as of 01.02).
 
### Nick Taylor assigned more time for the m3792 project (EJFAT-EsNet).
 
### Nick Taylor assigned more time for the m3792 project (EJFAT-EsNet).
# Summary of the project's undertakings and key achievements
+
# Summary of the project's undertakings  
## '''M3'''
+
## Define mechanisms to act on user workflows, such as reducing previously allocated resources to the user workflow/application.
### Define mechanisms to act on user workflows, such as reducing previously allocated resources to the user workflow/application.
+
## JCS  
## '''M4'''
+
### Starting VKs (Jiriaf nodes) through the k8s API management system
### JCS design and development
+
### Jiriaf node naming convention and labeling
#### Starting VKs (Jiriaf nodes) through the k8s API management system
+
### Jiriaf k8s cluster autoscaling (with possible AI support)
##### Jiriaf node naming convention and labeling
+
### Defining workflows/pods in the cluster that are unschedulable
#### Jiriaf k8s cluster autoscaling (with possible AI support)
+
### JCS and Jiriaf database relationship. Tables, such as
##### Defining workflows/pods in the cluster that are unschedulable
+
### ''available resource'', ''user requests'', and ''user workflow status''.
#### JCS and Jiriaf database relationship. Tables, such as
+
### Examine the ''site resources'' database table (constantly updated by SWIF2) and submit SWIF2 requests to launch nodes and allocate/lease resources.
##### ''available resource'', ''user requests'', and ''user workflow status''.
+
### Communicate with the k8s App server, ensuring submitted jobs are running, updating JIRIAF's ''available resource'' DB table.
##### Examine the ''site resources'' database table (constantly updated by SWIF2) and submit SWIF2 requests to launch nodes and allocate/lease resources.
+
### Develop a resource-request matching algorithm that compares user requests with the available resources.
##### Communicate with the k8s App server, ensuring submitted jobs are running, updating JIRIAF's ''available resource'' DB table.
+
### Define and suggest metadata structure for requests for accurate matching.  
##### Develop a resource-request matching algorithm that compares user requests with the available resources.
+
## JRM
##### Define and suggest metadata structure for requests for accurate matching.
+
### Implement a function using ConfigMap configuration to write files in pods.
 
+
### Anatomy of the node/vk launch script.
## '''M5'''
 
### JIRIAF k8s node/vk_cmd: Implement a function that can use ConfigMap configuration to write files in pods.
 
#### Anatomy of the node/vk launch script.
 
 
### VK hardware monitor server
 
### VK hardware monitor server
## Future milestones
+
## JFE
### Accepting opportunistic workflow primarily designed for streaming purposes.  
+
### Forms to submit user workflow requests
### Mathematical model for simulating the abstract processor/actor within the JIRIAF ecosystem.  
+
#### Login and authentication, Processing type (batch, streaming, opportunistic-streaming, etc.), Docker image location, Resource requirements (core type, core count, memory, disk, time, data provisioning details).
### Definition of the parameters and functionalities of the distributed workflow agent model and initiation of its design.
+
##### Research if k8s provides facilities for this (e.g., k8s dashboard)
# Slides for upcoming presentations in preparation for the publications
+
### Visualize Jiriaf database tables.
## Slide describing JIRIAF virtual k8s cluster creation, emphasizing its dynamic nature.
+
#### Dynamic updates.
## Slide showing Prometheus integration to monitor JIRIAF k8s cluster and pods.
+
# Preparation for the publication.   
## Start working on a paper describing JIRIAF resource acquisition and workflow deployment within a dynamic k8s cluster.   
 
 
# AOT
 
# AOT
 
==== Useful References ====
 
==== Useful References ====

Revision as of 14:43, 11 January 2024


Connection Info:

You can connect using the following link (Meeting ID: 160 126 6529). (Click "Expand" to the right for details -->):

One tap mobile: US: +16692545252,,1608518798# or +16468287666,,1608518798#
Meeting URL: https://jlab-org.zoomgov.com/j/1601266529?pwd=ZkZKL0tjeWFpbmxDeWZob0VmbzNOUT09&from=addon
Meeting ID: 160 126 6529
Passcode: 292304

Join by Telephone
For higher quality, dial a number based on your current location.
Dial:
US: +1 669 254 5252 or +1 646 828 7666 or +1 551 285 1373 or +1 669 216 1590 or 833 568 8864 (Toll Free)
Meeting ID: 160 126 6529

International numbers
Join by SIP
1616903130@sip.zoomgov.com
Join by H.323
161.199.138.10 (US West)
161.199.136.10 (US East)
Meeting ID: 160 851 8798
Passcode: 292304


Agenda:

  1. Announcements
    1. Welcome Patrick onboard.
    2. Problem getting an NSLS-II data-intensive workflow for migration.
    3. New sites for deployments: ORNL and ANL.
      1. Ticket (INC0114103) requesting Jiriaf nodes to access DOE computing facilities like NERSC and ORNL.
        1. The list of IP addresses and ports to present to the security and networking team.
    4. NERSC allocation
      1. Jiriaf's project request has been approved (as of 01.02).
      2. Nick Taylor assigned more time for the m3792 project (EJFAT-EsNet).
  2. Summary of the project's undertakings
    1. Define mechanisms to act on user workflows, such as reducing previously allocated resources to the user workflow/application.
    2. JCS
      1. Starting VKs (Jiriaf nodes) through the k8s API management system
      2. Jiriaf node naming convention and labeling
      3. Jiriaf k8s cluster autoscaling (with possible AI support)
      4. Defining workflows/pods in the cluster that are unschedulable
      5. JCS and Jiriaf database relationship. Tables, such as
      6. available resource, user requests, and user workflow status.
      7. Examine the site resources database table (constantly updated by SWIF2) and submit SWIF2 requests to launch nodes and allocate/lease resources.
      8. Communicate with the k8s App server, ensuring submitted jobs are running, updating JIRIAF's available resource DB table.
      9. Develop a resource-request matching algorithm that compares user requests with the available resources.
      10. Define and suggest metadata structure for requests for accurate matching.
    3. JRM
      1. Implement a function using ConfigMap configuration to write files in pods.
      2. Anatomy of the node/vk launch script.
      3. VK hardware monitor server
    4. JFE
      1. Forms to submit user workflow requests
        1. Login and authentication, Processing type (batch, streaming, opportunistic-streaming, etc.), Docker image location, Resource requirements (core type, core count, memory, disk, time, data provisioning details).
          1. Research if k8s provides facilities for this (e.g., k8s dashboard)
      2. Visualize Jiriaf database tables.
        1. Dynamic updates.
  3. Preparation for the publication.
  4. AOT

Useful References



Minutes: