JRM Launcher: How To and Usage Guide

Setup

Prerequisites

Python 3.9
FireWorks library
Required Python packages (install via pip install -r requirements.txt)

Configuration

Create a site-specific configuration file based on the template in fw-lpad/FireWorks/jrm_launcher/site_config_template.yaml
Save your configuration file with a meaningful name, e.g., perlmutter_config.yaml

Setting up the site_config_file

The site configuration file is crucial for proper operation. Here's how to set it up, using ORNL and Perlmutter configurations as examples:

Create a new YAML file for your site configuration (e.g., ornl_config.yaml or perlmutter_config.yaml)
Structure the file with the following sections: slurm, jrm, and ssh
Fill in the values for each section based on your site's requirements

Example: ORNL Configuration

slurm:
  nodes: 1
  constraint: ejfat
  walltime: 00:30:00
  qos: normal
  account: csc266
  reservation:

jrm:
  nodename: jrm-ornl
  site: ornl
  control_plane_ip: jiriaf2302
  apiserver_port: 38687
  kubeconfig: /ccsopen/home/jlabtsai/run-vk/kubeconfig/jiriaf2302
  image: docker:jlabtsai/vk-cmd:main
  vkubelet_pod_ips:
    - 172.17.0.1
  custom_metrics_ports: [2221, 1776, 8088, 2222]
  config_class:
  
ssh:
  remote_proxy:
  remote: < this is the IP address of the remote machine where the fw-agent is running >
  ssh_key:
  password: < this is a password encoded in base64 >
  build_script: ./build-ssh-ornl.sh

Example: Perlmutter Configuration

slurm:
  nodes: 2
  constraint: cpu
  walltime: 00:05:00
  qos: debug
  account: m3792

jrm:
  nodename: jrm-perlmutter
  site: perlmutter
  control_plane_ip: jiriaf2302
  apiserver_port: 38687
  kubeconfig: /global/homes/j/jlabtsai/run-vk/kubeconfig/jiriaf2302
  image: docker:jlabtsai/vk-cmd:main
  vkubelet_pod_ips: [172.17.0.1]
  custom_metrics_ports: [2221, 1776, 8088, 2222]
  config_class:

ssh:
  remote_proxy: perlmutter
  remote: < this is the IP address of the remote machine where the fw-agent is running >
  ssh_key: < this is the ssh key to access the remote machine >
  password:
  build_script:

Key Configuration Points

slurm section: Configure SLURM-specific parameters such as number of nodes, constraints, walltime, QoS, and account.
jrm section: Set JRM-specific details including nodename, site, control plane IP, API server port, kubeconfig path, Docker image, and any custom configurations.
ssh section: Specify SSH-related information for remote access, including proxy settings, remote address, SSH key, and optional build script.

Remember to replace placeholder values (indicated by < >) with actual values specific to your environment.

Save the file with a descriptive name (e.g., ornl_config.yaml or perlmutter_config.yaml) in the appropriate directory.

Basic Usage

The main entry point for the JRM Launcher is the main.sh script:

./main.sh <action> [options]

Available Actions

add_wf: Add a new workflow
delete_wf: Delete an existing workflow
delete_ports: Delete ports in a specified range
connect: Establish various connections (db, apiserver, metrics, custom_metrics)

Usage Examples

Add a new workflow

./main.sh add_wf --site_config_file /path/to/perlmutter_config.yaml

Delete a workflow

./main.sh delete_wf --fw_id 12345

Delete ports in a range

./main.sh delete_ports --start 10000 --end 20000

Connect to the database

./main.sh connect --connect_type db --site_config_file /path/to/perlmutter_config.yaml

Connect to the API server

./main.sh connect --connect_type apiserver --port 35679 --site_config_file /path/to/perlmutter_config.yaml

Connect to the metrics server

./main.sh connect --connect_type metrics --port 10001 --nodename vk-node-1 --site_config_file /path/to/perlmutter_config.yaml

Connect to custom metrics

./main.sh connect --connect_type custom_metrics --mapped_port 20001 --custom_metrics_port 8080 --nodename vk-node-1 --site_config_file /path/to/perlmutter_config.yaml

Starting the Container

To start the JRM Launcher container:

docker run --name=jrm-fw-lpad -itd --rm --net=host \
  -v ./test-config.yaml:/fw/test-config.yaml \
  -v $logs:/fw/logs \
  -v ./perl-config.yaml:/fw/per-config.yaml \
  -v ./ornl-config.yaml:/fw/ornl-config.yaml \
  -v `pwd`/port_table.yaml:/fw/port_table.yaml \
  -v $HOME/.ssh/nersc:/root/.ssh/nersc \
  jlabtsai/jrm-fw-lpad:main

Replace $logs with the actual path to your logs directory.

After creating the container, log in and use main.sh to operate. It's recommended to use only one container to manipulate multiple launches of JRMs.

FireWorks Agent Setup

Setup Instructions

Create a new directory for your FireWorks agent:

mkdir fw-agent
cd fw-agent

Copy the requirements.txt file into this directory
Create and activate a virtual environment:

python3 -m venv venv
source venv/bin/activate

Install required packages:

pip install -r requirements.txt

Copy the fw_config directory and its contents into your fw-agent directory
Configure the FireWorks files for your specific site (e.g., ORNL)

Running FireWorks Agent

SSH into the remote compute site
Navigate to your fw-agent directory
Activate the virtual environment:

source venv/bin/activate

Run the FireWorks qlaunch command:

qlaunch -r rapidfire

Troubleshooting

Check logs in the LOG_PATH directory for detailed information about executed commands and their results
Ensure the configuration file is correctly formatted and contains all required fields
Verify that necessary ports are available and not blocked by firewalls
For SSH connection issues, check the logs in the LOG_PATH directory

For more detailed information about each component, refer to the inline documentation in the respective Python files.

Deply JRMs on NERSC and ORNL via Fireworks

Contents