JIRIAF Meeting Jan. 11 2024

From epsciwiki
Revision as of 15:16, 10 January 2024 by Gurjyan (talk | contribs)
Jump to navigation Jump to search


Connection Info:

You can connect using the following link (Meeting ID: 160 126 6529). (Click "Expand" to the right for details -->):


Agenda:

  1. Announcements
    1. Welcome Patrick onboard.
    2. Problem getting an NSLS-II data-intensive workflow for migration.
    3. New sites for deployments: ORNL and ANL.
      1. Ticket (INC0114103) requesting Jiriaf nodes to access DOE computing facilities like NERSC and ORNL.
        1. The list of IP addresses and ports to present to the security and networking team.
    4. NERSC allocation
      1. Jiriaf's project request has been approved (as of 01.02).
      2. Nick Taylor assigned more time for the m3792 project (EJFAT-EsNet).
  2. Summary of the project's undertakings and key achievements
    1. M3
      1. Define mechanisms to act on user workflows, such as reducing previously allocated resources to the user workflow/application.
    2. M4
      1. JCS design and development
        1. Starting VKs (Jiriaf nodes) through the k8s API management system
          1. Jiriaf node naming convention and labeling
        2. Jiriaf k8s cluster autoscaling (with possible AI support)
          1. Defining workflows/pods in the cluster that are unschedulable
        3. JCS and Jiriaf database relationship. Tables, such as
          1. available resource, user requests, and user workflow status.
          2. Examine the site resources database table (constantly updated by SWIF2) and submit SWIF2 requests to launch nodes and allocate/lease resources.
          3. Communicate with the k8s App server, ensuring submitted jobs are running, updating JIRIAF's available resource DB table.
          4. Develop a resource-request matching algorithm that compares user requests with the available resources.
          5. Define and suggest metadata structure for requests for accurate matching.
    1. M5
      1. JIRIAF k8s node/vk_cmd: Implement a function that can use ConfigMap configuration to write files in pods.
        1. Anatomy of the node/vk launch script.
      2. VK hardware monitor server
    2. Future milestones
      1. Accepting opportunistic workflow primarily designed for streaming purposes.
      2. Mathematical model for simulating the abstract processor/actor within the JIRIAF ecosystem.
      3. Definition of the parameters and functionalities of the distributed workflow agent model and initiation of its design.
  1. Slides for upcoming presentations in preparation for the publications
    1. Slide describing JIRIAF virtual k8s cluster creation, emphasizing its dynamic nature.
    2. Slide showing Prometheus integration to monitor JIRIAF k8s cluster and pods.
    3. Start working on a paper describing JIRIAF resource acquisition and workflow deployment within a dynamic k8s cluster.
  2. AOT

Useful References



Minutes: