Difference between revisions of "Deploy JRMs on NERSC and ORNL via Fireworks"

From epsciwiki
Jump to navigation Jump to search
(Created page with "= JRM Launcher: How To and Usage Guide = == Setup == === Prerequisites === * Python 3.9 * FireWorks library * Required Python packages (install via <code>pip install -r requ...")
 
Line 1: Line 1:
= JRM Launcher: How To and Usage Guide =
 
  
== Setup ==
+
== Launching JRMs: Detailed Step-by-Step Guide ==
  
=== Prerequisites ===
+
=== Part 1: Setting up JRM Launcher (fw-lpad) ===
* Python 3.9
+
1. Install prerequisites:
* FireWorks library
+
* MongoDB (for storing workflow of JRM launches)
* Required Python packages (install via <code>pip install -r requirements.txt</code>)
+
* Kubernetes API server
 +
* Valid kubeconfig file for the Kubernetes cluster
 +
* Docker
 +
* Python 3.9 (for developers)
  
=== Configuration ===
+
2. Set up MongoDB for storing Fireworks workflows:
# Create a site-specific configuration file based on the template in <code>fw-lpad/FireWorks/jrm_launcher/site_config_template.yaml</code>
+
* Create and start a MongoDB container:
# Save your configuration file with a meaningful name, e.g., <code>perlmutter_config.yaml</code>
+
<pre>
 +
docker run -d -p 27017:27017 --name mongodb-container \
 +
  -v $HOME/JIRIAF/mongodb/data:/data/db mongo:latest
 +
</pre>
 +
* Wait for MongoDB to start (about 10 seconds), then create a new database and user:
 +
<pre>
 +
docker exec -it mongodb-container mongosh --eval '
 +
  db.getSiblingDB("jiriaf").createUser({
 +
    user: "jiriaf",
 +
    pwd: "jiriaf",
 +
    roles: [{role: "readWrite", db: "jiriaf"}]
 +
  })
 +
'
 +
</pre>
  
=== Setting up the site_config_file ===
+
3. Prepare the site configuration file:
 +
* Use the template in <code>fw-lpad/FireWorks/jrm_launcher/site_config_template.yaml</code>
 +
* Create a configuration file for your specific site (e.g., perlmutter_config.yaml or ornl_config.yaml)
 +
* Examples of site_config files:
 +
** Perlmutter configuration example (perlmutter_config.yaml):
 +
<pre>
 +
slurm:
 +
  nodes: 1
 +
  constraint: cpu
 +
  walltime: 00:10:00
 +
  qos: debug
 +
  account: m3792  #m3792 #m4637
 +
  reservation: # 100G
  
The site configuration file is crucial for proper operation. Here's how to set it up, using ORNL and Perlmutter configurations as examples:
+
jrm:
 +
  nodename: jrm-perlmutter
 +
  site: perlmutter
 +
  control_plane_ip: jiriaf2302
 +
  apiserver_port: 38687
 +
  kubeconfig: /global/homes/j/jlabtsai/run-vk/kubeconfig/jiriaf2302
 +
  image: docker:jlabtsai/vk-cmd:main
 +
  vkubelet_pod_ips:
 +
    - 172.17.0.1
 +
  custom_metrics_ports: [2221, 1776, 8088, 2222]
 +
  config_class:
  
# Create a new YAML file for your site configuration (e.g., <code>ornl_config.yaml</code> or <code>perlmutter_config.yaml</code>)
+
ssh:
# Structure the file with the following sections: <code>slurm</code>, <code>jrm</code>, and <code>ssh</code>
+
  remote_proxy: jlabtsai@perlmutter.nersc.gov
# Fill in the values for each section based on your site's requirements
+
  remote: jlabtsai@128.55.64.13
 +
  ssh_key: /root/.ssh/nersc
 +
  password:
 +
  build_script:
 +
</pre>
 +
** ORNL configuration example (ornl_config.yaml):
 +
<pre>
 +
slurm:
 +
  nodes: 1
 +
  constraint: ejfat
 +
  walltime: 00:10:00
 +
  qos: normal
 +
  account: csc266
 +
  reservation: #ejfat_demo
  
==== Example: ORNL Configuration ====
+
jrm:
<syntaxhighlight lang="yaml">
+
  nodename: jrm-ornl
slurm:
+
  site: ornl
  nodes: 1
+
  control_plane_ip: jiriaf2302
  constraint: ejfat
+
  apiserver_port: 38687
  walltime: 00:30:00
+
  kubeconfig: /ccsopen/home/jlabtsai/run-vk/kubeconfig/jiriaf2302
  qos: normal
+
  image: docker:jlabtsai/vk-cmd:main
  account: csc266
+
  vkubelet_pod_ips:
  reservation:
+
    - 172.17.0.1
 +
  custom_metrics_ports: [2221, 1776, 8088, 2222]
 +
  config_class:
  
jrm:
+
ssh:
  nodename: jrm-ornl
+
  remote_proxy:
  site: ornl
+
  remote: 172.30.161.5
  control_plane_ip: jiriaf2302
+
  ssh_key:
  apiserver_port: 38687
+
  password: < user password in base64 >
  kubeconfig: /ccsopen/home/jlabtsai/run-vk/kubeconfig/jiriaf2302
+
  build_script: /root/build-ssh-ornl.sh
  image: docker:jlabtsai/vk-cmd:main
+
</pre>
  vkubelet_pod_ips:
 
    - 172.17.0.1
 
  custom_metrics_ports: [2221, 1776, 8088, 2222]
 
  config_class:
 
 
 
ssh:
 
  remote_proxy:
 
  remote: < this is the IP address of the remote machine where the fw-agent is running >
 
  ssh_key:
 
  password: < this is a password encoded in base64 >
 
  build_script: ./build-ssh-ornl.sh
 
</syntaxhighlight>
 
  
==== Example: Perlmutter Configuration ====
+
4. Prepare necessary files and directories:
<syntaxhighlight lang="yaml">
+
* Create a directory for logs
slurm:
+
* Create a <code>port_table.yaml</code> file
  nodes: 2
+
* Ensure you have the necessary SSH key (e.g., for NERSC access)
  constraint: cpu
+
* Create a <code>my_launchpad.yaml</code> file with the MongoDB connection details:
  walltime: 00:05:00
+
<pre>
  qos: debug
+
host: localhost
  account: m3792
+
logdir: &lt;path to logs&gt;
 +
mongoclient_kwargs: {}
 +
name: jiriaf
 +
password: jiriaf
 +
port: 27017
 +
strm_lvl: INFO
 +
uri_mode: false
 +
user_indices: []
 +
username: jiriaf
 +
wf_user_indices: []
 +
</pre>
  
jrm:
+
5. Copy the kubeconfig file to the remote site:
  nodename: jrm-perlmutter
+
<pre>scp /path/to/local/kubeconfig user@remote:/path/to/remote/kubeconfig</pre>
  site: perlmutter
 
  control_plane_ip: jiriaf2302
 
  apiserver_port: 38687
 
  kubeconfig: /global/homes/j/jlabtsai/run-vk/kubeconfig/jiriaf2302
 
  image: docker:jlabtsai/vk-cmd:main
 
  vkubelet_pod_ips: [172.17.0.1]
 
  custom_metrics_ports: [2221, 1776, 8088, 2222]
 
  config_class:
 
  
ssh:
+
6. Start the JRM Launcher container:
  remote_proxy: perlmutter
+
<pre>
  remote: < this is the IP address of the remote machine where the fw-agent is running >
+
export logs=/path/to/your/logs/directory
  ssh_key: < this is the ssh key to access the remote machine >
+
docker run --name=jrm-fw-lpad -itd --rm --net=host \
  password:
+
  -v ./your_site_config.yaml:/fw/your_site_config.yaml \
  build_script:
+
  -v $logs:/fw/logs \
</syntaxhighlight>
+
  -v `pwd`/port_table.yaml:/fw/port_table.yaml \
 +
  -v $HOME/.ssh/nersc:/root/.ssh/nersc \
 +
  -v `pwd`/my_launchpad.yaml:/fw/util/my_launchpad.yaml \
 +
  jlabtsai/jrm-fw-lpad:main
 +
</pre>
  
==== Key Configuration Points ====
+
7. Verify the container is running:
 +
<pre>docker ps</pre>
  
* <code>slurm</code> section: Configure SLURM-specific parameters such as number of nodes, constraints, walltime, QoS, and account.
+
8. Log into the container:
* <code>jrm</code> section: Set JRM-specific details including nodename, site, control plane IP, API server port, kubeconfig path, Docker image, and any custom configurations.
+
<pre>docker exec -it jrm-fw-lpad /bin/bash</pre>
* <code>ssh</code> section: Specify SSH-related information for remote access, including proxy settings, remote address, SSH key, and optional build script.
 
  
Remember to replace placeholder values (indicated by < >) with actual values specific to your environment.
+
9. Add a workflow:
 +
<pre>./main.sh add_wf /fw/your_site_config.yaml</pre>
  
Save the file with a descriptive name (e.g., <code>ornl_config.yaml</code> or <code>perlmutter_config.yaml</code>) in the appropriate directory.
+
10. Note the workflow ID provided for future reference
  
== Basic Usage ==
+
=== Part 2: Setting up FireWorks Agent (fw-agent) on Remote Compute Site ===
 +
1. SSH into the remote compute site
  
The main entry point for the JRM Launcher is the <code>main.sh</code> script:
+
2. Create a new directory for your FireWorks agent:
 +
<pre>
 +
mkdir fw-agent
 +
cd fw-agent
 +
</pre>
  
<syntaxhighlight lang="bash">
+
3. Copy the <code>requirements.txt</code> file to this directory (you may need to transfer it from your local machine)
./main.sh <action> [options]
 
</syntaxhighlight>
 
  
=== Available Actions ===
+
4. Create a Python virtual environment and activate it:
* <code>add_wf</code>: Add a new workflow
+
<pre>
* <code>delete_wf</code>: Delete an existing workflow
+
python3.9 -m venv jrm_launcher
* <code>delete_ports</code>: Delete ports in a specified range
+
source jrm_launcher/bin/activate
* <code>connect</code>: Establish various connections (db, apiserver, metrics, custom_metrics)
+
</pre>
  
=== Usage Examples ===
+
5. Install the required packages:
 +
<pre>pip install -r requirements.txt</pre>
  
==== Add a new workflow ====
+
6. Create the <code>fw_config</code> directory and necessary configuration files:
<syntaxhighlight lang="bash">
+
<pre>
./main.sh add_wf --site_config_file /path/to/perlmutter_config.yaml
+
mkdir fw_config
</syntaxhighlight>
+
cd fw_config
 +
</pre>
  
==== Delete a workflow ====
+
7. Create and configure the following files in the <code>fw_config</code> directory:
<syntaxhighlight lang="bash">
+
* <code>my_fworker.yaml</code>:
./main.sh delete_wf --fw_id 12345
+
** For Perlmutter:
</syntaxhighlight>
+
<pre>
 +
category: perlmutter
 +
name: perlmutter
 +
query: '{}'
 +
</pre>
 +
** For ORNL:
 +
<pre>
 +
category: ornl
 +
name: ornl
 +
query: '{}'
 +
</pre>
 +
* <code>my_qadapter.yaml</code>:
 +
<pre>
 +
_fw_name: CommonAdapter
 +
_fw_q_type: SLURM
 +
_fw_template_file: &lt;path to queue_template.yaml&gt;
 +
rocket_launch: rlaunch -c &lt;path to fw_config&gt; singleshot
 +
nodes:
 +
walltime:
 +
constraint:
 +
account:
 +
job_name:
 +
logdir: &lt;path to logs&gt;
 +
pre_rocket:
 +
post_rocket:
 +
</pre>
 +
* <code>my_launchpad.yaml</code>:
 +
<pre>
 +
host: localhost
 +
logdir: &lt;path to logs&gt;
 +
mongoclient_kwargs: {}
 +
name: jiriaf
 +
password: jiriaf
 +
port: 27017
 +
strm_lvl: INFO
 +
uri_mode: false
 +
user_indices: []
 +
username: jiriaf
 +
wf_user_indices: []
 +
</pre>
 +
* <code>queue_template.yaml</code>:
 +
<pre>
 +
#!/bin/bash -l
  
==== Delete ports in a range ====
+
#SBATCH --nodes=$${nodes}
<syntaxhighlight lang="bash">
+
#SBATCH --ntasks=$${ntasks}
./main.sh delete_ports --start 10000 --end 20000
+
#SBATCH --ntasks-per-node=$${ntasks_per_node}
</syntaxhighlight>
+
#SBATCH --cpus-per-task=$${cpus_per_task}
 +
#SBATCH --mem=$${mem}
 +
#SBATCH --gres=$${gres}
 +
#SBATCH --qos=$${qos}
 +
#SBATCH --time=$${walltime}
 +
#SBATCH --partition=$${queue}
 +
#SBATCH --account=$${account}
 +
#SBATCH --job-name=$${job_name}
 +
#SBATCH --license=$${license}
 +
#SBATCH --output=$${job_name}-%j.out
 +
#SBATCH --error=$${job_name}-%j.error
 +
#SBATCH --constraint=$${constraint}
 +
#SBATCH --reservation=$${reservation}
  
==== Connect to the database ====
+
$${pre_rocket}
<syntaxhighlight lang="bash">
+
cd $${launch_dir}
./main.sh connect --connect_type db --site_config_file /path/to/perlmutter_config.yaml
+
$${rocket_launch}
</syntaxhighlight>
+
$${post_rocket}
 +
</pre>
  
==== Connect to the API server ====
+
8. Test the connection to the LaunchPad database:
<syntaxhighlight lang="bash">
+
<pre>lpad -c &lt;path to fw_config&gt; reset</pre>
./main.sh connect --connect_type apiserver --port 35679 --site_config_file /path/to/perlmutter_config.yaml
+
If prompted "Are you sure? This will RESET your LaunchPad. (Y/N)", type 'N' to cancel
</syntaxhighlight>
 
  
==== Connect to the metrics server ====
+
9. Run the FireWorks agent:
<syntaxhighlight lang="bash">
+
<pre>qlaunch -c &lt;path to fw_config&gt; -r rapidfire</pre>
./main.sh connect --connect_type metrics --port 10001 --nodename vk-node-1 --site_config_file /path/to/perlmutter_config.yaml
 
</syntaxhighlight>
 
  
==== Connect to custom metrics ====
+
=== Managing Workflows and Connections ===
<syntaxhighlight lang="bash">
+
Use the following commands on the fw-lpad machine to manage workflows and connections:
./main.sh connect --connect_type custom_metrics --mapped_port 20001 --custom_metrics_port 8080 --nodename vk-node-1 --site_config_file /path/to/perlmutter_config.yaml
+
* Delete a workflow: <pre>./main.sh delete_wf &lt;workflow_id&gt;</pre>
</syntaxhighlight>
+
* Delete ports: <pre>./main.sh delete_ports &lt;start_port&gt; &lt;end_port&gt;</pre>
 +
* Connect to database: <pre>./main.sh connect db /fw/your_site_config.yaml</pre>
 +
* Connect to API server: <pre>./main.sh connect apiserver 35679 /fw/your_site_config.yaml</pre>
 +
* Connect to metrics server: <pre>./main.sh connect metrics 10001 vk-node-1 /fw/your_site_config.yaml</pre>
 +
* Connect to custom metrics: <pre>./main.sh connect custom_metrics 20001 8080 vk-node-1 /fw/your_site_config.yaml</pre>
  
== Starting the Container ==
+
=== Troubleshooting ===
 
+
* Check logs in the <code>LOG_PATH</code> directory for SSH connection issues
To start the JRM Launcher container:
+
* Ensure all configuration files are correctly formatted and contain required fields
 
 
<syntaxhighlight lang="bash">
 
docker run --name=jrm-fw-lpad -itd --rm --net=host \
 
  -v ./test-config.yaml:/fw/test-config.yaml \
 
  -v $logs:/fw/logs \
 
  -v ./perl-config.yaml:/fw/per-config.yaml \
 
  -v ./ornl-config.yaml:/fw/ornl-config.yaml \
 
  -v `pwd`/port_table.yaml:/fw/port_table.yaml \
 
  -v $HOME/.ssh/nersc:/root/.ssh/nersc \
 
  jlabtsai/jrm-fw-lpad:main
 
</syntaxhighlight>
 
 
 
Replace <code>$logs</code> with the actual path to your logs directory.
 
 
 
After creating the container, log in and use <code>main.sh</code> to operate. It's recommended to use only one container to manipulate multiple launches of JRMs.
 
 
 
== FireWorks Agent Setup ==
 
 
 
=== Setup Instructions ===
 
# Create a new directory for your FireWorks agent:
 
<syntaxhighlight lang="bash">
 
mkdir fw-agent
 
cd fw-agent
 
</syntaxhighlight>
 
 
 
# Copy the <code>requirements.txt</code> file into this directory
 
# Create and activate a virtual environment:
 
<syntaxhighlight lang="bash">
 
python3 -m venv venv
 
source venv/bin/activate
 
</syntaxhighlight>
 
 
 
# Install required packages:
 
<syntaxhighlight lang="bash">
 
pip install -r requirements.txt
 
</syntaxhighlight>
 
 
 
# Copy the <code>fw_config</code> directory and its contents into your <code>fw-agent</code> directory
 
# Configure the FireWorks files for your specific site (e.g., ORNL)
 
 
 
=== Running FireWorks Agent ===
 
# SSH into the remote compute site
 
# Navigate to your <code>fw-agent</code> directory
 
# Activate the virtual environment:
 
<syntaxhighlight lang="bash">
 
source venv/bin/activate
 
</syntaxhighlight>
 
 
 
# Run the FireWorks qlaunch command:
 
<syntaxhighlight lang="bash">
 
qlaunch -r rapidfire
 
</syntaxhighlight>
 
 
 
== Troubleshooting ==
 
 
 
* Check logs in the <code>LOG_PATH</code> directory for detailed information about executed commands and their results
 
* Ensure the configuration file is correctly formatted and contains all required fields
 
 
* Verify that necessary ports are available and not blocked by firewalls
 
* Verify that necessary ports are available and not blocked by firewalls
* For SSH connection issues, check the logs in the <code>LOG_PATH</code> directory
+
* For fw-agent issues:
 
+
** Ensure the FireWorks LaunchPad is accessible from the remote compute site
For more detailed information about each component, refer to the inline documentation in the respective Python files.
+
** Verify that the Python environment has all necessary dependencies installed
 +
* Consult the FireWorks documentation for more detailed configuration and usage information

Revision as of 21:14, 10 September 2024

Launching JRMs: Detailed Step-by-Step Guide

Part 1: Setting up JRM Launcher (fw-lpad)

1. Install prerequisites:

  • MongoDB (for storing workflow of JRM launches)
  • Kubernetes API server
  • Valid kubeconfig file for the Kubernetes cluster
  • Docker
  • Python 3.9 (for developers)

2. Set up MongoDB for storing Fireworks workflows:

  • Create and start a MongoDB container:
 docker run -d -p 27017:27017 --name mongodb-container \
   -v $HOME/JIRIAF/mongodb/data:/data/db mongo:latest
 
  • Wait for MongoDB to start (about 10 seconds), then create a new database and user:
 docker exec -it mongodb-container mongosh --eval '
   db.getSiblingDB("jiriaf").createUser({
     user: "jiriaf",
     pwd: "jiriaf",
     roles: [{role: "readWrite", db: "jiriaf"}]
   })
 '
 

3. Prepare the site configuration file:

  • Use the template in fw-lpad/FireWorks/jrm_launcher/site_config_template.yaml
  • Create a configuration file for your specific site (e.g., perlmutter_config.yaml or ornl_config.yaml)
  • Examples of site_config files:
    • Perlmutter configuration example (perlmutter_config.yaml):
 slurm:
   nodes: 1
   constraint: cpu
   walltime: 00:10:00
   qos: debug
   account: m3792  #m3792 #m4637
   reservation: # 100G

 jrm:
   nodename: jrm-perlmutter
   site: perlmutter
   control_plane_ip: jiriaf2302
   apiserver_port: 38687
   kubeconfig: /global/homes/j/jlabtsai/run-vk/kubeconfig/jiriaf2302
   image: docker:jlabtsai/vk-cmd:main
   vkubelet_pod_ips:
     - 172.17.0.1
   custom_metrics_ports: [2221, 1776, 8088, 2222]
   config_class:

 ssh:
   remote_proxy: jlabtsai@perlmutter.nersc.gov
   remote: jlabtsai@128.55.64.13
   ssh_key: /root/.ssh/nersc
   password:
   build_script:
 
    • ORNL configuration example (ornl_config.yaml):
 slurm:
   nodes: 1
   constraint: ejfat
   walltime: 00:10:00
   qos: normal
   account: csc266
   reservation: #ejfat_demo

 jrm:
   nodename: jrm-ornl
   site: ornl
   control_plane_ip: jiriaf2302
   apiserver_port: 38687
   kubeconfig: /ccsopen/home/jlabtsai/run-vk/kubeconfig/jiriaf2302
   image: docker:jlabtsai/vk-cmd:main
   vkubelet_pod_ips:
     - 172.17.0.1
   custom_metrics_ports: [2221, 1776, 8088, 2222]
   config_class:

 ssh:
   remote_proxy:
   remote: 172.30.161.5
   ssh_key:
   password: < user password in base64 >
   build_script: /root/build-ssh-ornl.sh
 

4. Prepare necessary files and directories:

  • Create a directory for logs
  • Create a port_table.yaml file
  • Ensure you have the necessary SSH key (e.g., for NERSC access)
  • Create a my_launchpad.yaml file with the MongoDB connection details:
 host: localhost
 logdir: <path to logs>
 mongoclient_kwargs: {}
 name: jiriaf
 password: jiriaf
 port: 27017
 strm_lvl: INFO
 uri_mode: false
 user_indices: []
 username: jiriaf
 wf_user_indices: []
 

5. Copy the kubeconfig file to the remote site:

scp /path/to/local/kubeconfig user@remote:/path/to/remote/kubeconfig

6. Start the JRM Launcher container:

 export logs=/path/to/your/logs/directory
 docker run --name=jrm-fw-lpad -itd --rm --net=host \
   -v ./your_site_config.yaml:/fw/your_site_config.yaml \
   -v $logs:/fw/logs \
   -v `pwd`/port_table.yaml:/fw/port_table.yaml \
   -v $HOME/.ssh/nersc:/root/.ssh/nersc \
   -v `pwd`/my_launchpad.yaml:/fw/util/my_launchpad.yaml \
   jlabtsai/jrm-fw-lpad:main
 

7. Verify the container is running:

docker ps

8. Log into the container:

docker exec -it jrm-fw-lpad /bin/bash

9. Add a workflow:

./main.sh add_wf /fw/your_site_config.yaml

10. Note the workflow ID provided for future reference

Part 2: Setting up FireWorks Agent (fw-agent) on Remote Compute Site

1. SSH into the remote compute site

2. Create a new directory for your FireWorks agent:

 mkdir fw-agent
 cd fw-agent
 

3. Copy the requirements.txt file to this directory (you may need to transfer it from your local machine)

4. Create a Python virtual environment and activate it:

 python3.9 -m venv jrm_launcher
 source jrm_launcher/bin/activate
 

5. Install the required packages:

pip install -r requirements.txt

6. Create the fw_config directory and necessary configuration files:

 mkdir fw_config
 cd fw_config
 

7. Create and configure the following files in the fw_config directory:

  • my_fworker.yaml:
    • For Perlmutter:
 category: perlmutter
 name: perlmutter
 query: '{}'
 
    • For ORNL:
 category: ornl
 name: ornl
 query: '{}'
 
  • my_qadapter.yaml:
 _fw_name: CommonAdapter
 _fw_q_type: SLURM
 _fw_template_file: <path to queue_template.yaml>
 rocket_launch: rlaunch -c <path to fw_config> singleshot
 nodes: 
 walltime:
 constraint:
 account:
 job_name:
 logdir: <path to logs>
 pre_rocket:
 post_rocket:
 
  • my_launchpad.yaml:
 host: localhost
 logdir: <path to logs>
 mongoclient_kwargs: {}
 name: jiriaf
 password: jiriaf
 port: 27017
 strm_lvl: INFO
 uri_mode: false
 user_indices: []
 username: jiriaf
 wf_user_indices: []
 
  • queue_template.yaml:
 #!/bin/bash -l

 #SBATCH --nodes=$${nodes}
 #SBATCH --ntasks=$${ntasks}
 #SBATCH --ntasks-per-node=$${ntasks_per_node}
 #SBATCH --cpus-per-task=$${cpus_per_task}
 #SBATCH --mem=$${mem}
 #SBATCH --gres=$${gres}
 #SBATCH --qos=$${qos}
 #SBATCH --time=$${walltime}
 #SBATCH --partition=$${queue}
 #SBATCH --account=$${account}
 #SBATCH --job-name=$${job_name}
 #SBATCH --license=$${license}
 #SBATCH --output=$${job_name}-%j.out
 #SBATCH --error=$${job_name}-%j.error
 #SBATCH --constraint=$${constraint}
 #SBATCH --reservation=$${reservation}

 $${pre_rocket}
 cd $${launch_dir}
 $${rocket_launch}
 $${post_rocket}
 

8. Test the connection to the LaunchPad database:

lpad -c <path to fw_config> reset

If prompted "Are you sure? This will RESET your LaunchPad. (Y/N)", type 'N' to cancel

9. Run the FireWorks agent:

qlaunch -c <path to fw_config> -r rapidfire

Managing Workflows and Connections

Use the following commands on the fw-lpad machine to manage workflows and connections:

  • Delete a workflow:
    ./main.sh delete_wf <workflow_id>
  • Delete ports:
    ./main.sh delete_ports <start_port> <end_port>
  • Connect to database:
    ./main.sh connect db /fw/your_site_config.yaml
  • Connect to API server:
    ./main.sh connect apiserver 35679 /fw/your_site_config.yaml
  • Connect to metrics server:
    ./main.sh connect metrics 10001 vk-node-1 /fw/your_site_config.yaml
  • Connect to custom metrics:
    ./main.sh connect custom_metrics 20001 8080 vk-node-1 /fw/your_site_config.yaml

Troubleshooting

  • Check logs in the LOG_PATH directory for SSH connection issues
  • Ensure all configuration files are correctly formatted and contain required fields
  • Verify that necessary ports are available and not blocked by firewalls
  • For fw-agent issues:
    • Ensure the FireWorks LaunchPad is accessible from the remote compute site
    • Verify that the Python environment has all necessary dependencies installed
  • Consult the FireWorks documentation for more detailed configuration and usage information