Create apptainer+GPU interactive farm job (can be used w/ Jupyter)

From epsciwiki
Revision as of 15:29, 8 December 2023 by Davidl (talk | contribs) (→‎Jupyter)
Jump to navigation Jump to search

Creating an interactive job in an `apptainer` container on scicomp node with GPU is not hard to do. It does, however, require a long command with a number of options, some of which require specific ordering. It makes things easier if this is written into a single script that can then be invoked with a simple command with no arguments.

Interactive shell

The script below can be used to run such a command. Copy into a file called "farm-ubuntu-22.04p1-gpu" (or whatever you want to call it). Set the execution bit (chmod +x farm-ubuntu-22.04p1-gpu) and then just run it with ./farm-ubuntu-22.04p1-gpu.

#!/bin/bash

# This will allocate a slurm job on a node with GPUS
# and then drop you into an interactive bash shell on
# that node. When you exit the shell, the job will be
# released and the resources deallocated automatically.
#
# The "--nv" option to apptainer will automatically map
# the host cuda installation into the container so that
# the gpu and nvidia command line tools (e.g. nvidia-smi)
# are available.
#
# To change any of the parameters you'll need to edit 
# this script.

exec salloc --nodes=1 \
  --ntasks-per-node=1\
  --cpus-per-task=4\
  --mem-per-cpu=2G\
  --time=2:00:00\
  --partition=gpu\
  --gres=gpu:T4:1\
  srun --pty \
  apptainer shell \
  -B /u,/group,/w,/work,/run \
  --nv \
  /scigroup/spack/mirror/singularity/images/epsci-ubuntu-22.04p1.img


Jupyter

This has the benefit of allowing VSCode to execute easily on the slurm farm and therefore run jupyter notebooks in a container on an allocated farm resource with GPU support. This can be better than running on an ifarm node since large jobs can take up a lot of resources there which is frowned upon. There are a couple of downsides of this method compared to running jupyter on the ifarm:

  1. The slurm resources parameters are hardcoded here so if you need to change them, you need to edit the script (or modify it to take those parameters itself)
  2. The slurm jobs have a lifetime and will die automatically when it runs out where ifarm jobs and basically run forever.
  3. Every time you (re)connect via VSCode this way, it will allocate another job, even if another one is still running.

To use this with Jupyter in VSCode, add an entry to you .ssh/config on your local computer making sure to point to the location of the interactive script from above wherever you placed it on the ifarm. (See here for more details):

# Interactive SLURM job on JLab SciComp farm using epsci-ubuntu-22.04p1 apptainer with GPU
Host epsci-ubuntu-22.04p1~farm-gpu
  HostName ifarm1802.jlab.org
  ProxyJump scilogin.jlab.org
  RemoteCommand /group/epsci/apps/bin/farm-ubuntu-22.04p1-gpu
  RequestTTY yes