Difference between revisions of "JIRIAF Meeting Feb. 16 2023"

From epsciwiki
Jump to navigation Jump to search
(Created page with " === Connection Info: === <div class="toccolours mw-collapsible mw-collapsed"> You can connect using [https://jlab-org.zoomgov.com/j/1608518798 https://jlab-org.zoomgov.com/j...")
 
Line 38: Line 38:
 
# [[JIRIAF Meeting Feb. 9 2023| Previous Meeting]]
 
# [[JIRIAF Meeting Feb. 9 2023| Previous Meeting]]
 
# Announcements
 
# Announcements
# Make the necessary preparations to move a time-critical workflow to NERSC.   
+
# Make the necessary preparations to move a time-critical workflow to NERSC (Vardan).   
 
## Modernize the CLAS12 reconstruction application so that it can function under the ERSAP streaming framework (Vardan).☺   
 
## Modernize the CLAS12 reconstruction application so that it can function under the ERSAP streaming framework (Vardan).☺   
 
## Modify the process for the CLAS12 reconstruction so that it may operate in a streaming manner (Vardan). ☺  
 
## Modify the process for the CLAS12 reconstruction so that it may operate in a streaming manner (Vardan). ☺  
 
## Perform a test drive of the CLAS12 reconstruction procedure to NERSC, which is currently operating on a Perlmutter node (Nick and Deby). ☺  
 
## Perform a test drive of the CLAS12 reconstruction procedure to NERSC, which is currently operating on a Perlmutter node (Nick and Deby). ☺  
 
## ERSAP should be updated to support ET source actors. This generic source component reads events from the FIFO associated with the CODA Event Transfer (Vardan).  
 
## ERSAP should be updated to support ET source actors. This generic source component reads events from the FIFO associated with the CODA Event Transfer (Vardan).  
 +
<br>
 
# Prototype JIRIAF Front End (JFE). (Horio)  
 
# Prototype JIRIAF Front End (JFE). (Horio)  
 
## Describe the hardware architecture of the user-facing JIRIAF computing node while assuming the following (Horio, Amitoj):   
 
## Describe the hardware architecture of the user-facing JIRIAF computing node while assuming the following (Horio, Amitoj):   
Line 60: Line 61:
 
### Number of cores   
 
### Number of cores   
 
### Disk   
 
### Disk   
### Wall time
+
### Expected time of completion 
 
### The kind of workflow   
 
### The kind of workflow   
### Priority  
+
### Priority
 +
<br>
 
# Prototype JIRIAF Central Service (JCS). (Vardan, Chris)  
 
# Prototype JIRIAF Central Service (JCS). (Vardan, Chris)  
 
## When designing the JCR, be sure to take into account the following pub/sub-technologies:   
 
## When designing the JCR, be sure to take into account the following pub/sub-technologies:   
Line 74: Line 76:
 
### Cancel jobs and perform local cleaning activities   
 
### Cancel jobs and perform local cleaning activities   
 
### Submit JRM tasks while making use of the super facility API and SWIF2  
 
### Submit JRM tasks while making use of the super facility API and SWIF2  
 +
<br>
 +
# Prototype JIRIAF Workflow Resource matching Service (JWRMS) (Vardan)
 +
## Develop a working model of the JWRMS.
 +
## Examine the JIRIAF resource pool and jobs queue database tables in order to locate a workflow
 +
that is compatible with the available resources;
 +
## The ability to combine tasks in order to make better use of the available resources.
 +
## Ensure that workflow priorities are supported.
 +
## Remove any tasks that have been finished from the workflow queue.
 +
## In the event that the task is only half finished, be sure to update the workflow queue.
 +
<br>
 +
# Prototype JIRIAF Facility Manager (JFM) (Vardan, Chris)
 +
## Keep an eye on the resources that are available at the remote computing facility.
 +
### Super facility application programming interface
 +
### SWIF2
 +
### Prometheus
 +
### etc.
 +
## Fill up the table in the resource-pool database for JIRIAF
 +
## The database table for the JIRIAF resource pool should be updated with the most recent information from the facility.
 +
 
# AOT
 
# AOT
  

Revision as of 18:39, 16 February 2023


Connection Info:

You can connect using https://jlab-org.zoomgov.com/j/1608518798 (Meeting ID: 160 851 8798). (Click "Expand" to the right for details -->):


Agenda:

  1. Previous Meeting
  2. Announcements
  3. Make the necessary preparations to move a time-critical workflow to NERSC (Vardan).
    1. Modernize the CLAS12 reconstruction application so that it can function under the ERSAP streaming framework (Vardan).☺
    2. Modify the process for the CLAS12 reconstruction so that it may operate in a streaming manner (Vardan). ☺
    3. Perform a test drive of the CLAS12 reconstruction procedure to NERSC, which is currently operating on a Perlmutter node (Nick and Deby). ☺
    4. ERSAP should be updated to support ET source actors. This generic source component reads events from the FIFO associated with the CODA Event Transfer (Vardan).


  1. Prototype JIRIAF Front End (JFE). (Horio)
    1. Describe the hardware architecture of the user-facing JIRIAF computing node while assuming the following (Horio, Amitoj):
      1. Docker images of workflows and maybe other kinds of containers might be staged by us.
      2. It's possible that we'll stage certain data sets relating to jobs.
      3. Hosting a data lake, an in-memory data grid with the option to store data on the disk, is recommended for time-critical or streaming operations.
    2. The front end of the web application is an essential component. We should also consider installing a WEB server in addition to the RUCIO storage element and the CVMFS file system.
      1. Setup web server - Design UI forms
      2. Form.io is one of the recommendations that we might take into account.
    3. Authentication of users and permission to access
      1. It should be noted that the RUCIO design provides a layer for authentication and permission.
    4. Install the JIRIAF job queue database (Horio, Chris)
    5. Establish and fill up the task queue database table for the JIRIAF (Horio, Chris)
      1. Define table structure
      2. Job ID
      3. Memory count
      4. Number of cores
      5. Disk
      6. Expected time of completion
      7. The kind of workflow
      8. Priority


  1. Prototype JIRIAF Central Service (JCS). (Vardan, Chris)
    1. When designing the JCR, be sure to take into account the following pub/sub-technologies:
      1. XMsg
      2. Kafka
    2. Specify the communication protocol that will be used between JCA and JRM
    3. Define JIRIAF resource-pool table structure
    4. Specify and develop JRM, a software agent that is able to
      1. Accept and carry out local tasks
      2. Report job-specific metrics
      3. Cancel jobs and perform local cleaning activities
      4. Submit JRM tasks while making use of the super facility API and SWIF2


  1. Prototype JIRIAF Workflow Resource matching Service (JWRMS) (Vardan)
    1. Develop a working model of the JWRMS.
    2. Examine the JIRIAF resource pool and jobs queue database tables in order to locate a workflow

that is compatible with the available resources;

    1. The ability to combine tasks in order to make better use of the available resources.
    2. Ensure that workflow priorities are supported.
    3. Remove any tasks that have been finished from the workflow queue.
    4. In the event that the task is only half finished, be sure to update the workflow queue.


  1. Prototype JIRIAF Facility Manager (JFM) (Vardan, Chris)
    1. Keep an eye on the resources that are available at the remote computing facility.
      1. Super facility application programming interface
      2. SWIF2
      3. Prometheus
      4. etc.
    2. Fill up the table in the resource-pool database for JIRIAF
    3. The database table for the JIRIAF resource pool should be updated with the most recent information from the facility.
  1. AOT

Useful References



Minutes: