Difference between revisions of "JIRIAF Meeting Feb. 16 2023"

From epsciwiki
Jump to navigation Jump to search
 
(4 intermediate revisions by the same user not shown)
Line 38: Line 38:
 
# [[JIRIAF Meeting Feb. 9 2023| Previous Meeting]]
 
# [[JIRIAF Meeting Feb. 9 2023| Previous Meeting]]
 
# Announcements
 
# Announcements
# Make the necessary preparations to move a time-critical workflow to NERSC (Vardan).   
+
# ''' Make the necessary preparations to move a time-critical workflow to NERSC ''' (Vardan).   
 
## Modernize the CLAS12 reconstruction application so that it can function under the ERSAP streaming framework (Vardan).☺   
 
## Modernize the CLAS12 reconstruction application so that it can function under the ERSAP streaming framework (Vardan).☺   
 
## Modify the process for the CLAS12 reconstruction so that it may operate in a streaming manner (Vardan). ☺  
 
## Modify the process for the CLAS12 reconstruction so that it may operate in a streaming manner (Vardan). ☺  
 
## Perform a test drive of the CLAS12 reconstruction procedure to NERSC, which is currently operating on a Perlmutter node (Nick and Deby). ☺  
 
## Perform a test drive of the CLAS12 reconstruction procedure to NERSC, which is currently operating on a Perlmutter node (Nick and Deby). ☺  
 
## ERSAP should be updated to support ET source actors. This generic source component reads events from the FIFO associated with the CODA Event Transfer (Vardan).  
 
## ERSAP should be updated to support ET source actors. This generic source component reads events from the FIFO associated with the CODA Event Transfer (Vardan).  
# Prototype JIRIAF Front End (JFE). (Horio)  
+
# ''' Prototype JIRIAF Front End (JFE). ''' (Horio)  
 
## Describe the hardware architecture of the user-facing JIRIAF computing node while assuming the following (Horio, Amitoj):   
 
## Describe the hardware architecture of the user-facing JIRIAF computing node while assuming the following (Horio, Amitoj):   
 
### Docker images of workflows and maybe other kinds of containers might be staged by us.   
 
### Docker images of workflows and maybe other kinds of containers might be staged by us.   
 
### It's possible that we'll stage certain data sets relating to jobs.   
 
### It's possible that we'll stage certain data sets relating to jobs.   
 
### Hosting a data lake, an in-memory data grid with the option to store data on the disk, is recommended for time-critical or streaming operations.   
 
### Hosting a data lake, an in-memory data grid with the option to store data on the disk, is recommended for time-critical or streaming operations.   
## The front end of the web application is an essential component. We should also consider installing a WEB server in addition to the RUCIO storage element and the CVMFS file system.   
+
## The front end of the web application is an essential component. We should also consider installing a WEB server in addition to the RUCIO storage element, CVMFS file system, Globus, XRootD.   
 
### Setup web server - Design UI forms   
 
### Setup web server - Design UI forms   
 
### Form.io is one of the recommendations that we might take into account.  
 
### Form.io is one of the recommendations that we might take into account.  
 
## Authentication of users and permission to access   
 
## Authentication of users and permission to access   
 +
### Adopt OSG mechanisms
 
### It should be noted that the RUCIO design provides a layer for authentication and permission.   
 
### It should be noted that the RUCIO design provides a layer for authentication and permission.   
 
## Install the JIRIAF job queue database (Horio, Chris)   
 
## Install the JIRIAF job queue database (Horio, Chris)   
Line 63: Line 64:
 
### The kind of workflow   
 
### The kind of workflow   
 
### Priority
 
### Priority
# Prototype JIRIAF Central Service (JCS). (Vardan, Chris)  
+
# ''' Prototype JIRIAF Central Service (JCS). ''' (Vardan, Chris)  
 
## When designing the JCR, be sure to take into account the following pub/sub-technologies:   
 
## When designing the JCR, be sure to take into account the following pub/sub-technologies:   
 
### XMsg  
 
### XMsg  
Line 74: Line 75:
 
### Cancel jobs and perform local cleaning activities   
 
### Cancel jobs and perform local cleaning activities   
 
### Submit JRM tasks while making use of the super facility API and SWIF2  
 
### Submit JRM tasks while making use of the super facility API and SWIF2  
# Prototype JIRIAF Workflow Resource matching Service (JWRMS) (Vardan)
+
# ''' Prototype JIRIAF Workflow Resource matching Service (JWRMS) ''' (Vardan)
 
## Develop a working model of the JWRMS.
 
## Develop a working model of the JWRMS.
## Examine the JIRIAF resource pool and jobs queue database tables in order to locate a workflow
+
## Examine the JIRIAF resource pool and jobs queue database tables in order to locate a workflow that is compatible with the available resources;
that is compatible with the available resources;
 
 
## The ability to combine tasks in order to make better use of the available resources.
 
## The ability to combine tasks in order to make better use of the available resources.
 
## Ensure that workflow priorities are supported.
 
## Ensure that workflow priorities are supported.
 
## Remove any tasks that have been finished from the workflow queue.
 
## Remove any tasks that have been finished from the workflow queue.
 
## In the event that the task is only half finished, be sure to update the workflow queue.
 
## In the event that the task is only half finished, be sure to update the workflow queue.
# Prototype JIRIAF Facility Manager (JFM) (Vardan, Chris)
+
# ''' Prototype JIRIAF Facility Manager (JFM) ''' (Vardan, Chris)
 
## Keep an eye on the resources that are available at the remote computing facility.  
 
## Keep an eye on the resources that are available at the remote computing facility.  
 
### Super facility application programming interface
 
### Super facility application programming interface

Latest revision as of 20:31, 16 February 2023


Connection Info:

You can connect using https://jlab-org.zoomgov.com/j/1608518798 (Meeting ID: 160 851 8798). (Click "Expand" to the right for details -->):


Agenda:

  1. Previous Meeting
  2. Announcements
  3. Make the necessary preparations to move a time-critical workflow to NERSC (Vardan).
    1. Modernize the CLAS12 reconstruction application so that it can function under the ERSAP streaming framework (Vardan).☺
    2. Modify the process for the CLAS12 reconstruction so that it may operate in a streaming manner (Vardan). ☺
    3. Perform a test drive of the CLAS12 reconstruction procedure to NERSC, which is currently operating on a Perlmutter node (Nick and Deby). ☺
    4. ERSAP should be updated to support ET source actors. This generic source component reads events from the FIFO associated with the CODA Event Transfer (Vardan).
  4. Prototype JIRIAF Front End (JFE). (Horio)
    1. Describe the hardware architecture of the user-facing JIRIAF computing node while assuming the following (Horio, Amitoj):
      1. Docker images of workflows and maybe other kinds of containers might be staged by us.
      2. It's possible that we'll stage certain data sets relating to jobs.
      3. Hosting a data lake, an in-memory data grid with the option to store data on the disk, is recommended for time-critical or streaming operations.
    2. The front end of the web application is an essential component. We should also consider installing a WEB server in addition to the RUCIO storage element, CVMFS file system, Globus, XRootD.
      1. Setup web server - Design UI forms
      2. Form.io is one of the recommendations that we might take into account.
    3. Authentication of users and permission to access
      1. Adopt OSG mechanisms
      2. It should be noted that the RUCIO design provides a layer for authentication and permission.
    4. Install the JIRIAF job queue database (Horio, Chris)
    5. Establish and fill up the task queue database table for the JIRIAF (Horio, Chris)
      1. Define table structure
      2. Job ID
      3. Memory count
      4. Number of cores
      5. Disk
      6. Expected time of completion
      7. The kind of workflow
      8. Priority
  5. Prototype JIRIAF Central Service (JCS). (Vardan, Chris)
    1. When designing the JCR, be sure to take into account the following pub/sub-technologies:
      1. XMsg
      2. Kafka
    2. Specify the communication protocol that will be used between JCA and JRM
    3. Define JIRIAF resource-pool table structure
    4. Specify and develop JRM, a software agent that is able to
      1. Accept and carry out local tasks
      2. Report job-specific metrics
      3. Cancel jobs and perform local cleaning activities
      4. Submit JRM tasks while making use of the super facility API and SWIF2
  6. Prototype JIRIAF Workflow Resource matching Service (JWRMS) (Vardan)
    1. Develop a working model of the JWRMS.
    2. Examine the JIRIAF resource pool and jobs queue database tables in order to locate a workflow that is compatible with the available resources;
    3. The ability to combine tasks in order to make better use of the available resources.
    4. Ensure that workflow priorities are supported.
    5. Remove any tasks that have been finished from the workflow queue.
    6. In the event that the task is only half finished, be sure to update the workflow queue.
  7. Prototype JIRIAF Facility Manager (JFM) (Vardan, Chris)
    1. Keep an eye on the resources that are available at the remote computing facility.
      1. Super facility application programming interface
      2. SWIF2
      3. Prometheus
      4. etc.
    2. Fill up the table in the resource-pool database for JIRIAF
    3. The database table for the JIRIAF resource pool should be updated with the most recent information from the facility.
  8. AOT

Useful References



Minutes: