Deploy JRMs on NERSC and ORNL via Fireworks
JRM Launcher: How To and Usage Guide
Setup
Prerequisites
- Python 3.9
- FireWorks library
- Required Python packages (install via
pip install -r requirements.txt
)
Configuration
- Create a site-specific configuration file based on the template in
fw-lpad/FireWorks/jrm_launcher/site_config_template.yaml
- Save your configuration file with a meaningful name, e.g.,
perlmutter_config.yaml
Setting up the site_config_file
The site configuration file is crucial for proper operation. Here's how to set it up, using ORNL and Perlmutter configurations as examples:
- Create a new YAML file for your site configuration (e.g.,
ornl_config.yaml
orperlmutter_config.yaml
) - Structure the file with the following sections:
slurm
,jrm
, andssh
- Fill in the values for each section based on your site's requirements
Example: ORNL Configuration
slurm:
nodes: 1
constraint: ejfat
walltime: 00:30:00
qos: normal
account: csc266
reservation:
jrm:
nodename: jrm-ornl
site: ornl
control_plane_ip: jiriaf2302
apiserver_port: 38687
kubeconfig: /ccsopen/home/jlabtsai/run-vk/kubeconfig/jiriaf2302
image: docker:jlabtsai/vk-cmd:main
vkubelet_pod_ips:
- 172.17.0.1
custom_metrics_ports: [2221, 1776, 8088, 2222]
config_class:
ssh:
remote_proxy:
remote: < this is the IP address of the remote machine where the fw-agent is running >
ssh_key:
password: < this is a password encoded in base64 >
build_script: ./build-ssh-ornl.sh
Example: Perlmutter Configuration
slurm:
nodes: 2
constraint: cpu
walltime: 00:05:00
qos: debug
account: m3792
jrm:
nodename: jrm-perlmutter
site: perlmutter
control_plane_ip: jiriaf2302
apiserver_port: 38687
kubeconfig: /global/homes/j/jlabtsai/run-vk/kubeconfig/jiriaf2302
image: docker:jlabtsai/vk-cmd:main
vkubelet_pod_ips: [172.17.0.1]
custom_metrics_ports: [2221, 1776, 8088, 2222]
config_class:
ssh:
remote_proxy: perlmutter
remote: < this is the IP address of the remote machine where the fw-agent is running >
ssh_key: < this is the ssh key to access the remote machine >
password:
build_script:
Key Configuration Points
slurm
section: Configure SLURM-specific parameters such as number of nodes, constraints, walltime, QoS, and account.jrm
section: Set JRM-specific details including nodename, site, control plane IP, API server port, kubeconfig path, Docker image, and any custom configurations.ssh
section: Specify SSH-related information for remote access, including proxy settings, remote address, SSH key, and optional build script.
Remember to replace placeholder values (indicated by < >) with actual values specific to your environment.
Save the file with a descriptive name (e.g., ornl_config.yaml
or perlmutter_config.yaml
) in the appropriate directory.
Basic Usage
The main entry point for the JRM Launcher is the main.sh
script:
./main.sh <action> [options]
Available Actions
add_wf
: Add a new workflowdelete_wf
: Delete an existing workflowdelete_ports
: Delete ports in a specified rangeconnect
: Establish various connections (db, apiserver, metrics, custom_metrics)
Usage Examples
Add a new workflow
./main.sh add_wf --site_config_file /path/to/perlmutter_config.yaml
Delete a workflow
./main.sh delete_wf --fw_id 12345
Delete ports in a range
./main.sh delete_ports --start 10000 --end 20000
Connect to the database
./main.sh connect --connect_type db --site_config_file /path/to/perlmutter_config.yaml
Connect to the API server
./main.sh connect --connect_type apiserver --port 35679 --site_config_file /path/to/perlmutter_config.yaml
Connect to the metrics server
./main.sh connect --connect_type metrics --port 10001 --nodename vk-node-1 --site_config_file /path/to/perlmutter_config.yaml
Connect to custom metrics
./main.sh connect --connect_type custom_metrics --mapped_port 20001 --custom_metrics_port 8080 --nodename vk-node-1 --site_config_file /path/to/perlmutter_config.yaml
Starting the Container
To start the JRM Launcher container:
docker run --name=jrm-fw-lpad -itd --rm --net=host \
-v ./test-config.yaml:/fw/test-config.yaml \
-v $logs:/fw/logs \
-v ./perl-config.yaml:/fw/per-config.yaml \
-v ./ornl-config.yaml:/fw/ornl-config.yaml \
-v `pwd`/port_table.yaml:/fw/port_table.yaml \
-v $HOME/.ssh/nersc:/root/.ssh/nersc \
jlabtsai/jrm-fw-lpad:main
Replace $logs
with the actual path to your logs directory.
After creating the container, log in and use main.sh
to operate. It's recommended to use only one container to manipulate multiple launches of JRMs.
FireWorks Agent Setup
Setup Instructions
- Create a new directory for your FireWorks agent:
mkdir fw-agent
cd fw-agent
- Copy the
requirements.txt
file into this directory - Create and activate a virtual environment:
python3 -m venv venv
source venv/bin/activate
- Install required packages:
pip install -r requirements.txt
- Copy the
fw_config
directory and its contents into yourfw-agent
directory - Configure the FireWorks files for your specific site (e.g., ORNL)
Running FireWorks Agent
- SSH into the remote compute site
- Navigate to your
fw-agent
directory - Activate the virtual environment:
source venv/bin/activate
- Run the FireWorks qlaunch command:
qlaunch -r rapidfire
Troubleshooting
- Check logs in the
LOG_PATH
directory for detailed information about executed commands and their results - Ensure the configuration file is correctly formatted and contains all required fields
- Verify that necessary ports are available and not blocked by firewalls
- For SSH connection issues, check the logs in the
LOG_PATH
directory
For more detailed information about each component, refer to the inline documentation in the respective Python files.