JIRIAF
JLAB Integrated Research Infrastructure Across Facilities
Project Description
The JIRIAF (JLAB Integrated Research Infrastructure Across Facilities ) project aims to appraise capabilities to combine geographically diverse computing facilities into an integrated science infrastructure. This assumes evaluating an infrastructure that dynamically integrates temporarily unallocated or idled compute resources from various providers. Since the participating facilities will have diverse resources and local running workflows, it becomes essential to study the challenges of heterogeneous, distributed, and opportunistic compute resource provisioning from several participating data centers that will be presented to an end-user as a single, unified computing infrastructure. Policies and requirements for computational workflows that can effectively utilize volatile resources are critical for this integrated scientific environment. The primary objective of the JIRIAF project is to test the relocation of computing workflow from resources close to the experiment to a geographically remote data center in cases when near real-time data quality checks are required or online calibration and alignment, and so on, are in need. And the relocation of a workflow between two geographically remote data centers In cases when local resources are insufficient for data processing. We need to understand what types of solutions work, where future investment is required, the operational and sociological aspects of collaboration across sites, and which science workflows benefit most from distributed infrastructure.
This project is well-positioned to show the feasibility of workload rollovers across DOE computing facilities. This intern will provide operational resilience and load balancing during peak times and bring science-oriented computing facilities together, mandating uniform data movement, data processing API unification, and resource sharing. In the end, the science rate will increase. Static resource provisioning by carving resources from the local farms and dedicating them to guest tasks is straightforward. Also, DOE has dedicated resources (such as NERSC) that can be requested and allocated for specific tasks—not to mention OSG. But along with possible dedicated resource provisioning, the novelty of this project is to satisfy occasional, un-scheduled tasks that need timely processing, such as workflows that are slowed down or stopped due to computer center unforeseen maintenance periods, quick data QAs during the data acquisition (including streaming and triggered DAQs), fast analysis trains to check physics, etc. In other words integrating DOE compute facilities that a user sees as one facility (no resource request proposals, approvals, special memberships, etc.)
Projects | Meetings and Collaboration |
|
Date | Presenter | Event | Link |
---|---|---|---|
2024-03-13 | Jeng. Tsai | ACAT 2024 | pptx |
2024-02-22 | Jeng. Tsai | NERSC DATA DAY 2024 | pptx |
2024-01-11 | V. Gyurjyan | JLAB presentation | logo |
2023-04-08 | V. Gyurjyan | 26TH International Conference on Computing in High Energy & Nuclear Phusics | pptx |
2022-07-22 | V. Gyurjyan | [JLAB LDRD Defense] | |
2023-02-09 | V. Gyurjyan | [Conceptual Design] |
Publications
Date | Journal | Title |
---|---|---|
2024-07-31 | ACAT 2024 Proceedings | Optimizing Resource Provisioning Across Diverse Computing Facilities with Virtual Kubelet Integration |
How To
Install these to set up a Kubernetes cluster
Install Kubernetes in Docker (KinD)
Install Metrics Server in Kubernetes
How to deploy JRMs at the compute site of NERSC, ORNL, or local EJ-FAT nodes
How to Deploy ERSAP data pipelines at the sites
Prerequisite
Deploy Prometheus Monitoring with Prometheus Operator (Install this first)
Deployment
Manuscripts
Notes
Details of Virtual-kubelet-cmd
Deploying JRM with Virtual-kubelet-cmd Docker Image
JRM Supports Autoscaling in Kubernetes
JRM Deployment Using FireWorks
Useful Links
Repositories
Refer to the JIRIAF for more details on the following repositories.
Test of stream processing with JIRIAF
Digital twin for queue systems
Test of pod-autoscaling of JIRIAF
Customized process exporter of Prometheus