JIRIAF Meeting Feb. 16 2023

From epsciwiki
Revision as of 20:31, 16 February 2023 by Gurjyan (talk | contribs) (→‎Agenda:)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


Connection Info:

You can connect using https://jlab-org.zoomgov.com/j/1608518798 (Meeting ID: 160 851 8798). (Click "Expand" to the right for details -->):

One tap mobile: US: +16692545252,,1608518798# or +16468287666,,1608518798#
Meeting URL: https://jlab-org.zoomgov.com/j/1608518798?pwd=NnU3cW1ZZFhTTUQ2Y0hIRU5JTWg0UT09&from=addon
Meeting ID: 160 851 8798
Passcode: 205601

Join by Telephone
For higher quality, dial a number based on your current location.
Dial:
US: +1 669 254 5252 or +1 646 828 7666 or +1 551 285 1373 or +1 669 216 1590 or 833 568 8864 (Toll Free)
Meeting ID: 160 851 8798

International numbers
Join by SIP
1608518798@sip.zoomgov.com
Join by H.323
161.199.138.10 (US West)
161.199.136.10 (US East)
Meeting ID: 160 851 8798
Passcode: 205601


Agenda:

  1. Previous Meeting
  2. Announcements
  3. Make the necessary preparations to move a time-critical workflow to NERSC (Vardan).
    1. Modernize the CLAS12 reconstruction application so that it can function under the ERSAP streaming framework (Vardan).☺
    2. Modify the process for the CLAS12 reconstruction so that it may operate in a streaming manner (Vardan). ☺
    3. Perform a test drive of the CLAS12 reconstruction procedure to NERSC, which is currently operating on a Perlmutter node (Nick and Deby). ☺
    4. ERSAP should be updated to support ET source actors. This generic source component reads events from the FIFO associated with the CODA Event Transfer (Vardan).
  4. Prototype JIRIAF Front End (JFE). (Horio)
    1. Describe the hardware architecture of the user-facing JIRIAF computing node while assuming the following (Horio, Amitoj):
      1. Docker images of workflows and maybe other kinds of containers might be staged by us.
      2. It's possible that we'll stage certain data sets relating to jobs.
      3. Hosting a data lake, an in-memory data grid with the option to store data on the disk, is recommended for time-critical or streaming operations.
    2. The front end of the web application is an essential component. We should also consider installing a WEB server in addition to the RUCIO storage element, CVMFS file system, Globus, XRootD.
      1. Setup web server - Design UI forms
      2. Form.io is one of the recommendations that we might take into account.
    3. Authentication of users and permission to access
      1. Adopt OSG mechanisms
      2. It should be noted that the RUCIO design provides a layer for authentication and permission.
    4. Install the JIRIAF job queue database (Horio, Chris)
    5. Establish and fill up the task queue database table for the JIRIAF (Horio, Chris)
      1. Define table structure
      2. Job ID
      3. Memory count
      4. Number of cores
      5. Disk
      6. Expected time of completion
      7. The kind of workflow
      8. Priority
  5. Prototype JIRIAF Central Service (JCS). (Vardan, Chris)
    1. When designing the JCR, be sure to take into account the following pub/sub-technologies:
      1. XMsg
      2. Kafka
    2. Specify the communication protocol that will be used between JCA and JRM
    3. Define JIRIAF resource-pool table structure
    4. Specify and develop JRM, a software agent that is able to
      1. Accept and carry out local tasks
      2. Report job-specific metrics
      3. Cancel jobs and perform local cleaning activities
      4. Submit JRM tasks while making use of the super facility API and SWIF2
  6. Prototype JIRIAF Workflow Resource matching Service (JWRMS) (Vardan)
    1. Develop a working model of the JWRMS.
    2. Examine the JIRIAF resource pool and jobs queue database tables in order to locate a workflow that is compatible with the available resources;
    3. The ability to combine tasks in order to make better use of the available resources.
    4. Ensure that workflow priorities are supported.
    5. Remove any tasks that have been finished from the workflow queue.
    6. In the event that the task is only half finished, be sure to update the workflow queue.
  7. Prototype JIRIAF Facility Manager (JFM) (Vardan, Chris)
    1. Keep an eye on the resources that are available at the remote computing facility.
      1. Super facility application programming interface
      2. SWIF2
      3. Prometheus
      4. etc.
    2. Fill up the table in the resource-pool database for JIRIAF
    3. The database table for the JIRIAF resource pool should be updated with the most recent information from the facility.
  8. AOT

Useful References



Minutes: