JIRIAF Meeting Feb. 16 2023
Jump to navigation
Jump to search
Connection Info:
Expand
You can connect using https://jlab-org.zoomgov.com/j/1608518798 (Meeting ID: 160 851 8798). (Click "Expand" to the right for details -->):
Agenda:
- Previous Meeting
- Announcements
- Make the necessary preparations to move a time-critical workflow to NERSC (Vardan).
- Modernize the CLAS12 reconstruction application so that it can function under the ERSAP streaming framework (Vardan).☺
- Modify the process for the CLAS12 reconstruction so that it may operate in a streaming manner (Vardan). ☺
- Perform a test drive of the CLAS12 reconstruction procedure to NERSC, which is currently operating on a Perlmutter node (Nick and Deby). ☺
- ERSAP should be updated to support ET source actors. This generic source component reads events from the FIFO associated with the CODA Event Transfer (Vardan).
- Prototype JIRIAF Front End (JFE). (Horio)
- Describe the hardware architecture of the user-facing JIRIAF computing node while assuming the following (Horio, Amitoj):
- Docker images of workflows and maybe other kinds of containers might be staged by us.
- It's possible that we'll stage certain data sets relating to jobs.
- Hosting a data lake, an in-memory data grid with the option to store data on the disk, is recommended for time-critical or streaming operations.
- The front end of the web application is an essential component. We should also consider installing a WEB server in addition to the RUCIO storage element and the CVMFS file system.
- Setup web server - Design UI forms
- Form.io is one of the recommendations that we might take into account.
- Authentication of users and permission to access
- It should be noted that the RUCIO design provides a layer for authentication and permission.
- Install the JIRIAF job queue database (Horio, Chris)
- Establish and fill up the task queue database table for the JIRIAF (Horio, Chris)
- Define table structure
- Job ID
- Memory count
- Number of cores
- Disk
- Expected time of completion
- The kind of workflow
- Priority
- Describe the hardware architecture of the user-facing JIRIAF computing node while assuming the following (Horio, Amitoj):
- Prototype JIRIAF Central Service (JCS). (Vardan, Chris)
- When designing the JCR, be sure to take into account the following pub/sub-technologies:
- XMsg
- Kafka
- Specify the communication protocol that will be used between JCA and JRM
- Define JIRIAF resource-pool table structure
- Specify and develop JRM, a software agent that is able to
- Accept and carry out local tasks
- Report job-specific metrics
- Cancel jobs and perform local cleaning activities
- Submit JRM tasks while making use of the super facility API and SWIF2
- When designing the JCR, be sure to take into account the following pub/sub-technologies:
- Prototype JIRIAF Workflow Resource matching Service (JWRMS) (Vardan)
- Develop a working model of the JWRMS.
- Examine the JIRIAF resource pool and jobs queue database tables in order to locate a workflow
that is compatible with the available resources;
- The ability to combine tasks in order to make better use of the available resources.
- Ensure that workflow priorities are supported.
- Remove any tasks that have been finished from the workflow queue.
- In the event that the task is only half finished, be sure to update the workflow queue.
- Prototype JIRIAF Facility Manager (JFM) (Vardan, Chris)
- Keep an eye on the resources that are available at the remote computing facility.
- Super facility application programming interface
- SWIF2
- Prometheus
- etc.
- Fill up the table in the resource-pool database for JIRIAF
- The database table for the JIRIAF resource pool should be updated with the most recent information from the facility.
- Keep an eye on the resources that are available at the remote computing facility.
- AOT
Useful References