Difference between revisions of "JIRIAF Meeting Feb. 16 2023"
Jump to navigation
Jump to search
(4 intermediate revisions by the same user not shown) | |||
Line 38: | Line 38: | ||
# [[JIRIAF Meeting Feb. 9 2023| Previous Meeting]] | # [[JIRIAF Meeting Feb. 9 2023| Previous Meeting]] | ||
# Announcements | # Announcements | ||
− | # Make the necessary preparations to move a time-critical workflow to NERSC (Vardan). | + | # ''' Make the necessary preparations to move a time-critical workflow to NERSC ''' (Vardan). |
## Modernize the CLAS12 reconstruction application so that it can function under the ERSAP streaming framework (Vardan).☺ | ## Modernize the CLAS12 reconstruction application so that it can function under the ERSAP streaming framework (Vardan).☺ | ||
## Modify the process for the CLAS12 reconstruction so that it may operate in a streaming manner (Vardan). ☺ | ## Modify the process for the CLAS12 reconstruction so that it may operate in a streaming manner (Vardan). ☺ | ||
## Perform a test drive of the CLAS12 reconstruction procedure to NERSC, which is currently operating on a Perlmutter node (Nick and Deby). ☺ | ## Perform a test drive of the CLAS12 reconstruction procedure to NERSC, which is currently operating on a Perlmutter node (Nick and Deby). ☺ | ||
## ERSAP should be updated to support ET source actors. This generic source component reads events from the FIFO associated with the CODA Event Transfer (Vardan). | ## ERSAP should be updated to support ET source actors. This generic source component reads events from the FIFO associated with the CODA Event Transfer (Vardan). | ||
− | # Prototype JIRIAF Front End (JFE). (Horio) | + | # ''' Prototype JIRIAF Front End (JFE). ''' (Horio) |
## Describe the hardware architecture of the user-facing JIRIAF computing node while assuming the following (Horio, Amitoj): | ## Describe the hardware architecture of the user-facing JIRIAF computing node while assuming the following (Horio, Amitoj): | ||
### Docker images of workflows and maybe other kinds of containers might be staged by us. | ### Docker images of workflows and maybe other kinds of containers might be staged by us. | ||
### It's possible that we'll stage certain data sets relating to jobs. | ### It's possible that we'll stage certain data sets relating to jobs. | ||
### Hosting a data lake, an in-memory data grid with the option to store data on the disk, is recommended for time-critical or streaming operations. | ### Hosting a data lake, an in-memory data grid with the option to store data on the disk, is recommended for time-critical or streaming operations. | ||
− | ## The front end of the web application is an essential component. We should also consider installing a WEB server in addition to the RUCIO storage element | + | ## The front end of the web application is an essential component. We should also consider installing a WEB server in addition to the RUCIO storage element, CVMFS file system, Globus, XRootD. |
### Setup web server - Design UI forms | ### Setup web server - Design UI forms | ||
### Form.io is one of the recommendations that we might take into account. | ### Form.io is one of the recommendations that we might take into account. | ||
## Authentication of users and permission to access | ## Authentication of users and permission to access | ||
+ | ### Adopt OSG mechanisms | ||
### It should be noted that the RUCIO design provides a layer for authentication and permission. | ### It should be noted that the RUCIO design provides a layer for authentication and permission. | ||
## Install the JIRIAF job queue database (Horio, Chris) | ## Install the JIRIAF job queue database (Horio, Chris) | ||
Line 63: | Line 64: | ||
### The kind of workflow | ### The kind of workflow | ||
### Priority | ### Priority | ||
− | # Prototype JIRIAF Central Service (JCS). (Vardan, Chris) | + | # ''' Prototype JIRIAF Central Service (JCS). ''' (Vardan, Chris) |
## When designing the JCR, be sure to take into account the following pub/sub-technologies: | ## When designing the JCR, be sure to take into account the following pub/sub-technologies: | ||
### XMsg | ### XMsg | ||
Line 74: | Line 75: | ||
### Cancel jobs and perform local cleaning activities | ### Cancel jobs and perform local cleaning activities | ||
### Submit JRM tasks while making use of the super facility API and SWIF2 | ### Submit JRM tasks while making use of the super facility API and SWIF2 | ||
− | # Prototype JIRIAF Workflow Resource matching Service (JWRMS) (Vardan) | + | # ''' Prototype JIRIAF Workflow Resource matching Service (JWRMS) ''' (Vardan) |
## Develop a working model of the JWRMS. | ## Develop a working model of the JWRMS. | ||
− | ## Examine the JIRIAF resource pool and jobs queue database tables in order to locate a workflow | + | ## Examine the JIRIAF resource pool and jobs queue database tables in order to locate a workflow that is compatible with the available resources; |
− | that is compatible with the available resources; | ||
## The ability to combine tasks in order to make better use of the available resources. | ## The ability to combine tasks in order to make better use of the available resources. | ||
## Ensure that workflow priorities are supported. | ## Ensure that workflow priorities are supported. | ||
## Remove any tasks that have been finished from the workflow queue. | ## Remove any tasks that have been finished from the workflow queue. | ||
## In the event that the task is only half finished, be sure to update the workflow queue. | ## In the event that the task is only half finished, be sure to update the workflow queue. | ||
− | # Prototype JIRIAF Facility Manager (JFM) (Vardan, Chris) | + | # ''' Prototype JIRIAF Facility Manager (JFM) ''' (Vardan, Chris) |
## Keep an eye on the resources that are available at the remote computing facility. | ## Keep an eye on the resources that are available at the remote computing facility. | ||
### Super facility application programming interface | ### Super facility application programming interface |
Latest revision as of 20:31, 16 February 2023
Connection Info:
Expand
You can connect using https://jlab-org.zoomgov.com/j/1608518798 (Meeting ID: 160 851 8798). (Click "Expand" to the right for details -->):
Agenda:
- Previous Meeting
- Announcements
- Make the necessary preparations to move a time-critical workflow to NERSC (Vardan).
- Modernize the CLAS12 reconstruction application so that it can function under the ERSAP streaming framework (Vardan).☺
- Modify the process for the CLAS12 reconstruction so that it may operate in a streaming manner (Vardan). ☺
- Perform a test drive of the CLAS12 reconstruction procedure to NERSC, which is currently operating on a Perlmutter node (Nick and Deby). ☺
- ERSAP should be updated to support ET source actors. This generic source component reads events from the FIFO associated with the CODA Event Transfer (Vardan).
- Prototype JIRIAF Front End (JFE). (Horio)
- Describe the hardware architecture of the user-facing JIRIAF computing node while assuming the following (Horio, Amitoj):
- Docker images of workflows and maybe other kinds of containers might be staged by us.
- It's possible that we'll stage certain data sets relating to jobs.
- Hosting a data lake, an in-memory data grid with the option to store data on the disk, is recommended for time-critical or streaming operations.
- The front end of the web application is an essential component. We should also consider installing a WEB server in addition to the RUCIO storage element, CVMFS file system, Globus, XRootD.
- Setup web server - Design UI forms
- Form.io is one of the recommendations that we might take into account.
- Authentication of users and permission to access
- Adopt OSG mechanisms
- It should be noted that the RUCIO design provides a layer for authentication and permission.
- Install the JIRIAF job queue database (Horio, Chris)
- Establish and fill up the task queue database table for the JIRIAF (Horio, Chris)
- Define table structure
- Job ID
- Memory count
- Number of cores
- Disk
- Expected time of completion
- The kind of workflow
- Priority
- Describe the hardware architecture of the user-facing JIRIAF computing node while assuming the following (Horio, Amitoj):
- Prototype JIRIAF Central Service (JCS). (Vardan, Chris)
- When designing the JCR, be sure to take into account the following pub/sub-technologies:
- XMsg
- Kafka
- Specify the communication protocol that will be used between JCA and JRM
- Define JIRIAF resource-pool table structure
- Specify and develop JRM, a software agent that is able to
- Accept and carry out local tasks
- Report job-specific metrics
- Cancel jobs and perform local cleaning activities
- Submit JRM tasks while making use of the super facility API and SWIF2
- When designing the JCR, be sure to take into account the following pub/sub-technologies:
- Prototype JIRIAF Workflow Resource matching Service (JWRMS) (Vardan)
- Develop a working model of the JWRMS.
- Examine the JIRIAF resource pool and jobs queue database tables in order to locate a workflow that is compatible with the available resources;
- The ability to combine tasks in order to make better use of the available resources.
- Ensure that workflow priorities are supported.
- Remove any tasks that have been finished from the workflow queue.
- In the event that the task is only half finished, be sure to update the workflow queue.
- Prototype JIRIAF Facility Manager (JFM) (Vardan, Chris)
- Keep an eye on the resources that are available at the remote computing facility.
- Super facility application programming interface
- SWIF2
- Prometheus
- etc.
- Fill up the table in the resource-pool database for JIRIAF
- The database table for the JIRIAF resource pool should be updated with the most recent information from the facility.
- Keep an eye on the resources that are available at the remote computing facility.
- AOT
Useful References