Difference between revisions of "JIRIAF Meeting Apr. 18 2024"

From epsciwiki
Jump to navigation Jump to search
 
(5 intermediate revisions by the same user not shown)
Line 37: Line 37:
 
=== Agenda: ===
 
=== Agenda: ===
 
* '''Announcements'''   
 
* '''Announcements'''   
** Q2 [https://indico.jlab.org/event/845/contributions/14342/attachments/10967/16703/jiriaf-q2-24.pdf Report]
 
 
* '''JFE'''
 
* '''JFE'''
** Code [https://github.com/JeffersonLab/jiriaf-0.1/tree/main/JFE base]
 
 
*** Current status  
 
*** Current status  
 
**** CIlogon authentication, database, etc.
 
**** CIlogon authentication, database, etc.
***** Can folks from ORNL, ALS, or APS log in?
 
 
*** Deployment on jiriaf2301  
 
*** Deployment on jiriaf2301  
 
**** Job request queue
 
**** Job request queue
**** List of pending and active JRMs
+
***** Resource management (request, deploy, start)
 +
***** Support for static and dynamic JRMs
 +
***** User workflow (pod) management
 +
***** Monitoring
 +
***** List of pending and active JRMs
 
*** Public-facing website
 
*** Public-facing website
 
**** Grafana deployment metrics visualization
 
**** Grafana deployment metrics visualization
 
**** K8S visualization?
 
**** K8S visualization?
 
* '''JRM'''
 
* '''JRM'''
** [https://wiki.jlab.org/epsciwiki/index.php/Tables-for-JIRIAF Tables] and their physical location.
+
** [https://wiki.jlab.org/epsciwiki/index.php/Tables-for-JIRIAF Tables] in Mongodb?
 
** Metric server
 
** Metric server
** Horizontal autoscaling [https://wiki.jlab.org/epsciwiki/index.php/Autoscaling-jiriaf support]  
+
** Role of the horizontal [https://wiki.jlab.org/epsciwiki/index.php/Autoscaling-jiriaf autoscaling]  
 +
** Workflow management system.
 
* '''JCS and JMS'''  
 
* '''JCS and JMS'''  
 
** No proactivity support. JRM/JRMS according to workflow request.
 
** No proactivity support. JRM/JRMS according to workflow request.
*** Resource Acquisition
 
**** Time, CPU, and memory requests to steer deployment: SLURM -> JRM
 
**** Check the job request queue and decide if we need to run more JRMs
 
 
*** Remove JRM if the job is completed.
 
*** Remove JRM if the job is completed.
** [https://portal.fabric-testbed.net/ Fabric] deployment and testing platform.
 
*** Digital twin prototyping
 
 
* '''Digital twin'''
 
* '''Digital twin'''
** Bayesian network-based agent model for a site/workflow.
+
** Bayesian network-based agent model prototype.
 
*** Queueing theory-based mathematical model.
 
*** Queueing theory-based mathematical model.
* ''' Upcoming large scale deployment at NERSC '''
+
* ''' Large scale deployment at NERSC '''
 
** EJFAT data-stream pipeline new metrics.
 
** EJFAT data-stream pipeline new metrics.
 
*** Request and deploy 38 node/JRE
 
*** Request and deploy 38 node/JRE
Line 71: Line 68:
 
*** Reduce resources, i.e., stop JREs individually and measure data processing rate and packet loss.
 
*** Reduce resources, i.e., stop JREs individually and measure data processing rate and packet loss.
 
**** Time-dependent Grafana plots.
 
**** Time-dependent Grafana plots.
 +
* Deployment at ORNL
 
* ''' Documentation and code'''
 
* ''' Documentation and code'''
 
** Centralize the code base in [https://github.com/JeffersonLab/jiriaf-0.1 Github].
 
** Centralize the code base in [https://github.com/JeffersonLab/jiriaf-0.1 Github].
* ''' PR'''
+
* Demo and presentations.
** Demo and presentations.
+
** Start preparing our second paper.
** Start preparing a paper for NIM (e.g.)
 
 
** Start working on CHEP24 abstract and presentation
 
** Start working on CHEP24 abstract and presentation
 
* AOT
 
* AOT
Line 82: Line 79:
 
* [[JIRIAF Meetings]]
 
* [[JIRIAF Meetings]]
 
* [https://teams.microsoft.com/_#/FileBrowserTabApp/JIRIAF?threadId=19:9a90d2e6643f40af9491c50faf8143d2@thread.skype&ctx=channel JIRIAF Channel]
 
* [https://teams.microsoft.com/_#/FileBrowserTabApp/JIRIAF?threadId=19:9a90d2e6643f40af9491c50faf8143d2@thread.skype&ctx=channel JIRIAF Channel]
</small>
+
* Code [https://github.com/JeffersonLab/jiriaf-0.1/tree/main/JFE base] and [https://indico.jlab.org/event/845/contributions/14342/attachments/10967/16703/jiriaf-q2-24.pdf diagram]
 +
</small>
  
 
<hr>
 
<hr>
  
 
=== Minutes: ===
 
=== Minutes: ===

Latest revision as of 17:37, 2 May 2024


Connection Info:

You can connect using the following link (Meeting ID: 160 126 6529). (Click "Expand" to the right for details -->):

One tap mobile: US: +16692545252,,1608518798# or +16468287666,,1608518798#
Meeting URL: https://jlab-org.zoomgov.com/j/1601266529?pwd=ZkZKL0tjeWFpbmxDeWZob0VmbzNOUT09&from=addon
Meeting ID: 160 126 6529
Passcode: 292304

Join by Telephone
For higher quality, dial a number based on your current location.
Dial:
US: +1 669 254 5252 or +1 646 828 7666 or +1 551 285 1373 or +1 669 216 1590 or 833 568 8864 (Toll Free)
Meeting ID: 160 126 6529

International numbers
Join by SIP
1616903130@sip.zoomgov.com
Join by H.323
161.199.138.10 (US West)
161.199.136.10 (US East)
Meeting ID: 160 851 8798
Passcode: 292304


Agenda:

  • Announcements
  • JFE
      • Current status
        • CIlogon authentication, database, etc.
      • Deployment on jiriaf2301
        • Job request queue
          • Resource management (request, deploy, start)
          • Support for static and dynamic JRMs
          • User workflow (pod) management
          • Monitoring
          • List of pending and active JRMs
      • Public-facing website
        • Grafana deployment metrics visualization
        • K8S visualization?
  • JRM
    • Tables in Mongodb?
    • Metric server
    • Role of the horizontal autoscaling
    • Workflow management system.
  • JCS and JMS
    • No proactivity support. JRM/JRMS according to workflow request.
      • Remove JRM if the job is completed.
  • Digital twin
    • Bayesian network-based agent model prototype.
      • Queueing theory-based mathematical model.
  • Large scale deployment at NERSC
    • EJFAT data-stream pipeline new metrics.
      • Request and deploy 38 node/JRE
      • Run ERSAP pipeline
      • Confirm 100 Gbps data stream processing with 0 packet loss
      • Reduce resources, i.e., stop JREs individually and measure data processing rate and packet loss.
        • Time-dependent Grafana plots.
  • Deployment at ORNL
  • Documentation and code
    • Centralize the code base in Github.
  • Demo and presentations.
    • Start preparing our second paper.
    • Start working on CHEP24 abstract and presentation
  • AOT

Useful References


Minutes: