Deploy ERSAP data pipelines on EJFAT nodes via JIRIAF

From epsciwiki
Revision as of 19:49, 18 September 2024 by Tsai (talk | contribs) (→‎Step-by-Step Usage)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Deploy ERSAP Data Pipelines Workflows on EJFAT Nodes

This guide explains how to use the Helm charts in the main/local-ejfat/job folder to deploy ERSAP data pipeline workflows on EJFAT nodes.

Prerequisites

  • Helm 3 installed
  • Kubernetes cluster access
  • kubectl configured

Chart Structure

The main chart is located in the job/ directory. Key files include:

  • Chart.yaml: Metadata for the chart
  • values.yaml: Default configuration values
  • Templates in the templates/ directory

Step-by-Step Usage

Local ejfat workflow flow chart.png

This diagram provides a visual representation of the steps involved in setting up and deploying ERSAP workflows using the Helm charts in this repository.

1. Setup Environment

  • Ensure you have Helm 3 installed, access to a Kubernetes cluster, and kubectl configured.
  • Clone the repository and navigate to the local-ejfat folder:
git clone https://github.com/JeffersonLab/jiriaf-test-platform.git
cd jiriaf-test-platform/main/local-ejfat

2. Set Project ID

  • Export your project ID as an environment variable:
export ID=your-project-id # e.g. jlab-100g-nersc-ornl

3. Customize Deployment (Optional)

  • Edit job/values.yaml to customize your deployment. Key settings include:
Deployment:
  name: this-name-is-changing
  namespace: default
  replicas: 1
  serviceMonitorLabel: ersap-test4
  cpuUsage: "128"
  ejfatNode: "2"
ersapSettings:
  image: gurjyan/ersap:v0.1
  cmd: /ersap/run-pipeline.sh
  file: /x.ersap
  • For more advanced customization, consider modifying the template files in the job/templates/ directory:
    • job-job.yaml: Customize the Kubernetes Job specification
    • job-configmap.yaml: Adjust the scripts for job containers
    • job-service.yaml: Modify the Kubernetes Service configuration
    • prom-servicemonitor.yaml: Fine-tune Prometheus monitoring settings

4. Deploy Prometheus (If not already running)

helm ls | grep "$ID-prom"

If the command returns no results, it means there's no Prometheus instance for your project ID.

  • Install a Prometheus instance for your project if it's not already running:
cd main/prom
helm install $ID-prom prom/ --set Deployment.name=$ID

5. Deploy EJFAT Workflow

  • Use the launch_job.sh script to deploy the EJFAT workflow:
cd main/local-ejfat
ID=jlab-100g-nersc-ornl
INDEX=2
./launch_job.sh $ID $INDEX
  • The script uses the following variables:
    • ID: The project identifier (e.g., jlab-100g-nersc-ornl)
    • INDEX: A unique index for the job (e.g., 2)
  • This script will create two containers on EJFAT node (check job-configmap.yaml):
    • jlabtsai/process-exporter:pgid-go
    • gurjyan/ersap:v0.1

6. Monitor Deployment

  • Check Helm release: helm ls | grep $ID-job-ejfat-$INDEX
  • Check pod status: kubectl get pods
  • Describe a pod: kubectl describe pod <pod-name>

7. Manage Deployed Jobs and Clean Up Workflows

  • List deployed jobs: helm ls
  • Delete a deployed job:
helm uninstall $ID-job-ejfat-<INDEX>
  • After uninstalling the Helm release, manually delete containers on EJFAT nodes:
    1. Log in to each EJFAT node used in your deployment.
    2. List all containers: docker ps -a
    3. Identify containers related to your job.
    4. Remove these containers: docker rm -f <container-id>