Difference between revisions of "JIRIAF"
(57 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | |||
− | |||
− | |||
− | |||
== JLAB Integrated Research Infrastructure Across Facilities == | == JLAB Integrated Research Infrastructure Across Facilities == | ||
Line 40: | Line 36: | ||
!Event | !Event | ||
!Link | !Link | ||
+ | |- | ||
+ | |2024-03-13 | ||
+ | |Jeng. Tsai | ||
+ | | [https://indico.cern.ch/event/1330797/contributions/5796638/ ACAT 2024] | ||
+ | | [https://misportal.jlab.org/ul/publications/downloadFile.cfm?pub_id=20864 pptx] | ||
+ | |- | ||
+ | |2024-02-22 | ||
+ | |Jeng. Tsai | ||
+ | | [https://www.nersc.gov/users/training/past-training-events/2024/nersc-data-day-feb-21-22-2024/#:~:text=Table%20of%20Contents&text=NERSC%20is%20hosting%20Data%20Day,for%20scientific%20computing%20on%20Perlmutter. NERSC DATA DAY 2024] | ||
+ | | [https://misportal.jlab.org/ul/publications/downloadFile.cfm?pub_id=20743 pptx] | ||
+ | |- | ||
+ | |2024-01-11 | ||
+ | |V. Gyurjyan | ||
+ | | [https://userweb.jlab.org/~gurjyan/jiriaf/jiriaf.pptx JLAB presentation] | ||
+ | | [https://userweb.jlab.org/~gurjyan/jiriaf/jiriaf.jpg logo] | ||
|- | |- | ||
|2023-04-08 | |2023-04-08 | ||
Line 65: | Line 76: | ||
! Title | ! Title | ||
|- | |- | ||
+ | |2024-07-31 | ||
+ | | ACAT 2024 Proceedings | ||
+ | | Optimizing Resource Provisioning Across Diverse Computing Facilities with Virtual Kubelet Integration | ||
|} | |} | ||
− | === Notes == | + | |
+ | == How To == | ||
+ | === Install these to set up a Kubernetes cluster === | ||
+ | [[Install Kubernetes in Docker (KinD)]] | ||
+ | |||
+ | [[Install Metrics Server in Kubernetes]] | ||
+ | |||
+ | === How to deploy JRMs at the compute site of NERSC, ORNL, or local EJ-FAT nodes === | ||
+ | [[Deploy JRMs on local EJFAT nodes|EJ-FAT nodes]] | ||
+ | |||
+ | [[Deploy JRMs on NERSC and ORNL via Fireworks|NERSC, ORNL, or FABRIC]] | ||
+ | |||
+ | === How to Deploy ERSAP data pipelines at the sites === | ||
+ | |||
+ | ==== Prerequisite ==== | ||
+ | [[Deploy Prometheus Monitoring with Prometheus Operator]] (Install this first) | ||
+ | |||
+ | ==== Deployment ==== | ||
+ | [[Deploy ERSAP data pipelines on EJFAT nodes via JIRIAF|EJ-FAT nodes]] | ||
+ | |||
+ | [[Deploy ERSAP data pipelines at NERSC and ORNL via JIRIAF|NERSC, ORNL, or FABRIC]] | ||
+ | |||
+ | == Manuscripts == | ||
+ | [https://www.overleaf.com/read/ckgvzsgrkyjk#f0cfc5 JIRIAF] | ||
+ | |||
+ | [https://www.overleaf.com/read/dwhtxddwjrpb#47ce7c Digital twin for queue system] | ||
+ | |||
+ | == Notes == | ||
+ | [[virtual-kubelet-cmd|Details of Virtual-kubelet-cmd]] | ||
+ | |||
+ | [[VK-CMD|Deploying JRM with Virtual-kubelet-cmd Docker Image]] | ||
+ | |||
+ | [[Tables-for-JIRIAF|Tables for JIRIAF]] | ||
+ | |||
+ | [[job-scripts-jiriaf|Job Scripts for JIRIAF]] | ||
+ | |||
+ | [[autoscaling-jiriaf| JRM Supports Autoscaling in Kubernetes]] | ||
+ | |||
+ | [[jiriaf-fw| JRM Deployment Using FireWorks]] | ||
== Useful Links == | == Useful Links == | ||
+ | |||
+ | === Repositories === | ||
+ | Refer to the [https://www.overleaf.com/read/ckgvzsgrkyjk#f0cfc5 JIRIAF] for more details on the following repositories. | ||
+ | |||
+ | [https://github.com/JeffersonLab/jiriaf-virtual-kubelet-cmd Main repo of JRM] | ||
+ | |||
+ | [https://github.com/JeffersonLab/jiriaf-virtual-kubelet-cmd-docker Build Docker image of JRM] | ||
+ | |||
+ | [https://github.com/JeffersonLab/jiriaf-fireworks Launch JRMs using FireWorks] | ||
+ | |||
+ | [https://github.com/JeffersonLab/jiriaf-test-platform Test of stream processing with JIRIAF] | ||
+ | |||
+ | [https://github.com/JeffersonLab/jiriaf-digital-twin Digital twin for queue systems] | ||
+ | |||
+ | [https://github.com/JeffersonLab/jiriaf-horizontal-scaling Test of pod-autoscaling of JIRIAF] | ||
+ | |||
+ | [https://github.com/JeffersonLab/jiriaf-process-exporter Customized process exporter of Prometheus] | ||
+ | |||
+ | == Challenge == | ||
+ | [[job-assign-delete-jiriaf|Managing Job Assignments and Deletion in Kubernetes]] | ||
+ | |||
+ | [[Pod IPs]] |
Latest revision as of 13:29, 3 October 2024
JLAB Integrated Research Infrastructure Across Facilities
Project Description
The JIRIAF (JLAB Integrated Research Infrastructure Across Facilities ) project aims to appraise capabilities to combine geographically diverse computing facilities into an integrated science infrastructure. This assumes evaluating an infrastructure that dynamically integrates temporarily unallocated or idled compute resources from various providers. Since the participating facilities will have diverse resources and local running workflows, it becomes essential to study the challenges of heterogeneous, distributed, and opportunistic compute resource provisioning from several participating data centers that will be presented to an end-user as a single, unified computing infrastructure. Policies and requirements for computational workflows that can effectively utilize volatile resources are critical for this integrated scientific environment. The primary objective of the JIRIAF project is to test the relocation of computing workflow from resources close to the experiment to a geographically remote data center in cases when near real-time data quality checks are required or online calibration and alignment, and so on, are in need. And the relocation of a workflow between two geographically remote data centers In cases when local resources are insufficient for data processing. We need to understand what types of solutions work, where future investment is required, the operational and sociological aspects of collaboration across sites, and which science workflows benefit most from distributed infrastructure.
This project is well-positioned to show the feasibility of workload rollovers across DOE computing facilities. This intern will provide operational resilience and load balancing during peak times and bring science-oriented computing facilities together, mandating uniform data movement, data processing API unification, and resource sharing. In the end, the science rate will increase. Static resource provisioning by carving resources from the local farms and dedicating them to guest tasks is straightforward. Also, DOE has dedicated resources (such as NERSC) that can be requested and allocated for specific tasks—not to mention OSG. But along with possible dedicated resource provisioning, the novelty of this project is to satisfy occasional, un-scheduled tasks that need timely processing, such as workflows that are slowed down or stopped due to computer center unforeseen maintenance periods, quick data QAs during the data acquisition (including streaming and triggered DAQs), fast analysis trains to check physics, etc. In other words integrating DOE compute facilities that a user sees as one facility (no resource request proposals, approvals, special memberships, etc.)
Projects | Meetings and Collaboration |
|
Date | Presenter | Event | Link |
---|---|---|---|
2024-03-13 | Jeng. Tsai | ACAT 2024 | pptx |
2024-02-22 | Jeng. Tsai | NERSC DATA DAY 2024 | pptx |
2024-01-11 | V. Gyurjyan | JLAB presentation | logo |
2023-04-08 | V. Gyurjyan | 26TH International Conference on Computing in High Energy & Nuclear Phusics | pptx |
2022-07-22 | V. Gyurjyan | [JLAB LDRD Defense] | |
2023-02-09 | V. Gyurjyan | [Conceptual Design] |
Publications
Date | Journal | Title |
---|---|---|
2024-07-31 | ACAT 2024 Proceedings | Optimizing Resource Provisioning Across Diverse Computing Facilities with Virtual Kubelet Integration |
How To
Install these to set up a Kubernetes cluster
Install Kubernetes in Docker (KinD)
Install Metrics Server in Kubernetes
How to deploy JRMs at the compute site of NERSC, ORNL, or local EJ-FAT nodes
How to Deploy ERSAP data pipelines at the sites
Prerequisite
Deploy Prometheus Monitoring with Prometheus Operator (Install this first)
Deployment
Manuscripts
Notes
Details of Virtual-kubelet-cmd
Deploying JRM with Virtual-kubelet-cmd Docker Image
JRM Supports Autoscaling in Kubernetes
JRM Deployment Using FireWorks
Useful Links
Repositories
Refer to the JIRIAF for more details on the following repositories.
Test of stream processing with JIRIAF
Digital twin for queue systems
Test of pod-autoscaling of JIRIAF
Customized process exporter of Prometheus