Difference between revisions of "JIRIAF Meeting Feb. 16 2023"
Jump to navigation
Jump to search
(Created page with " === Connection Info: === <div class="toccolours mw-collapsible mw-collapsed"> You can connect using [https://jlab-org.zoomgov.com/j/1608518798 https://jlab-org.zoomgov.com/j...") |
|||
Line 38: | Line 38: | ||
# [[JIRIAF Meeting Feb. 9 2023| Previous Meeting]] | # [[JIRIAF Meeting Feb. 9 2023| Previous Meeting]] | ||
# Announcements | # Announcements | ||
− | # Make the necessary preparations to move a time-critical workflow to NERSC. | + | # Make the necessary preparations to move a time-critical workflow to NERSC (Vardan). |
## Modernize the CLAS12 reconstruction application so that it can function under the ERSAP streaming framework (Vardan).☺ | ## Modernize the CLAS12 reconstruction application so that it can function under the ERSAP streaming framework (Vardan).☺ | ||
## Modify the process for the CLAS12 reconstruction so that it may operate in a streaming manner (Vardan). ☺ | ## Modify the process for the CLAS12 reconstruction so that it may operate in a streaming manner (Vardan). ☺ | ||
## Perform a test drive of the CLAS12 reconstruction procedure to NERSC, which is currently operating on a Perlmutter node (Nick and Deby). ☺ | ## Perform a test drive of the CLAS12 reconstruction procedure to NERSC, which is currently operating on a Perlmutter node (Nick and Deby). ☺ | ||
## ERSAP should be updated to support ET source actors. This generic source component reads events from the FIFO associated with the CODA Event Transfer (Vardan). | ## ERSAP should be updated to support ET source actors. This generic source component reads events from the FIFO associated with the CODA Event Transfer (Vardan). | ||
+ | <br> | ||
# Prototype JIRIAF Front End (JFE). (Horio) | # Prototype JIRIAF Front End (JFE). (Horio) | ||
## Describe the hardware architecture of the user-facing JIRIAF computing node while assuming the following (Horio, Amitoj): | ## Describe the hardware architecture of the user-facing JIRIAF computing node while assuming the following (Horio, Amitoj): | ||
Line 60: | Line 61: | ||
### Number of cores | ### Number of cores | ||
### Disk | ### Disk | ||
− | ### | + | ### Expected time of completion |
### The kind of workflow | ### The kind of workflow | ||
− | ### Priority | + | ### Priority |
+ | <br> | ||
# Prototype JIRIAF Central Service (JCS). (Vardan, Chris) | # Prototype JIRIAF Central Service (JCS). (Vardan, Chris) | ||
## When designing the JCR, be sure to take into account the following pub/sub-technologies: | ## When designing the JCR, be sure to take into account the following pub/sub-technologies: | ||
Line 74: | Line 76: | ||
### Cancel jobs and perform local cleaning activities | ### Cancel jobs and perform local cleaning activities | ||
### Submit JRM tasks while making use of the super facility API and SWIF2 | ### Submit JRM tasks while making use of the super facility API and SWIF2 | ||
+ | <br> | ||
+ | # Prototype JIRIAF Workflow Resource matching Service (JWRMS) (Vardan) | ||
+ | ## Develop a working model of the JWRMS. | ||
+ | ## Examine the JIRIAF resource pool and jobs queue database tables in order to locate a workflow | ||
+ | that is compatible with the available resources; | ||
+ | ## The ability to combine tasks in order to make better use of the available resources. | ||
+ | ## Ensure that workflow priorities are supported. | ||
+ | ## Remove any tasks that have been finished from the workflow queue. | ||
+ | ## In the event that the task is only half finished, be sure to update the workflow queue. | ||
+ | <br> | ||
+ | # Prototype JIRIAF Facility Manager (JFM) (Vardan, Chris) | ||
+ | ## Keep an eye on the resources that are available at the remote computing facility. | ||
+ | ### Super facility application programming interface | ||
+ | ### SWIF2 | ||
+ | ### Prometheus | ||
+ | ### etc. | ||
+ | ## Fill up the table in the resource-pool database for JIRIAF | ||
+ | ## The database table for the JIRIAF resource pool should be updated with the most recent information from the facility. | ||
+ | |||
# AOT | # AOT | ||
Revision as of 18:39, 16 February 2023
Connection Info:
Expand
You can connect using https://jlab-org.zoomgov.com/j/1608518798 (Meeting ID: 160 851 8798). (Click "Expand" to the right for details -->):
Agenda:
- Previous Meeting
- Announcements
- Make the necessary preparations to move a time-critical workflow to NERSC (Vardan).
- Modernize the CLAS12 reconstruction application so that it can function under the ERSAP streaming framework (Vardan).☺
- Modify the process for the CLAS12 reconstruction so that it may operate in a streaming manner (Vardan). ☺
- Perform a test drive of the CLAS12 reconstruction procedure to NERSC, which is currently operating on a Perlmutter node (Nick and Deby). ☺
- ERSAP should be updated to support ET source actors. This generic source component reads events from the FIFO associated with the CODA Event Transfer (Vardan).
- Prototype JIRIAF Front End (JFE). (Horio)
- Describe the hardware architecture of the user-facing JIRIAF computing node while assuming the following (Horio, Amitoj):
- Docker images of workflows and maybe other kinds of containers might be staged by us.
- It's possible that we'll stage certain data sets relating to jobs.
- Hosting a data lake, an in-memory data grid with the option to store data on the disk, is recommended for time-critical or streaming operations.
- The front end of the web application is an essential component. We should also consider installing a WEB server in addition to the RUCIO storage element and the CVMFS file system.
- Setup web server - Design UI forms
- Form.io is one of the recommendations that we might take into account.
- Authentication of users and permission to access
- It should be noted that the RUCIO design provides a layer for authentication and permission.
- Install the JIRIAF job queue database (Horio, Chris)
- Establish and fill up the task queue database table for the JIRIAF (Horio, Chris)
- Define table structure
- Job ID
- Memory count
- Number of cores
- Disk
- Expected time of completion
- The kind of workflow
- Priority
- Describe the hardware architecture of the user-facing JIRIAF computing node while assuming the following (Horio, Amitoj):
- Prototype JIRIAF Central Service (JCS). (Vardan, Chris)
- When designing the JCR, be sure to take into account the following pub/sub-technologies:
- XMsg
- Kafka
- Specify the communication protocol that will be used between JCA and JRM
- Define JIRIAF resource-pool table structure
- Specify and develop JRM, a software agent that is able to
- Accept and carry out local tasks
- Report job-specific metrics
- Cancel jobs and perform local cleaning activities
- Submit JRM tasks while making use of the super facility API and SWIF2
- When designing the JCR, be sure to take into account the following pub/sub-technologies:
- Prototype JIRIAF Workflow Resource matching Service (JWRMS) (Vardan)
- Develop a working model of the JWRMS.
- Examine the JIRIAF resource pool and jobs queue database tables in order to locate a workflow
that is compatible with the available resources;
- The ability to combine tasks in order to make better use of the available resources.
- Ensure that workflow priorities are supported.
- Remove any tasks that have been finished from the workflow queue.
- In the event that the task is only half finished, be sure to update the workflow queue.
- Prototype JIRIAF Facility Manager (JFM) (Vardan, Chris)
- Keep an eye on the resources that are available at the remote computing facility.
- Super facility application programming interface
- SWIF2
- Prometheus
- etc.
- Fill up the table in the resource-pool database for JIRIAF
- The database table for the JIRIAF resource pool should be updated with the most recent information from the facility.
- Keep an eye on the resources that are available at the remote computing facility.
- AOT
Useful References