Difference between revisions of "Create apptainer+GPU interactive farm job (can be used w/ Jupyter)"
Line 46: | Line 46: | ||
Host epsci-ubuntu-22.04p1~farm-gpu | Host epsci-ubuntu-22.04p1~farm-gpu | ||
HostName ifarm1802.jlab.org | HostName ifarm1802.jlab.org | ||
− | ProxyJump | + | ProxyJump scilogin.jlab.org |
RemoteCommand /group/epsci/apps/bin/farm-ubuntu-22.04p1-gpu | RemoteCommand /group/epsci/apps/bin/farm-ubuntu-22.04p1-gpu | ||
RequestTTY yes | RequestTTY yes |
Revision as of 15:29, 8 December 2023
Creating an interactive job in an `apptainer` container on scicomp node with GPU is not hard to do. It does, however, require a long command with a number of options, some of which require specific ordering. It makes things easier if this is written into a single script that can then be invoked with a simple command with no arguments.
Interactive shell
The script below can be used to run such a command. Copy into a file called "farm-ubuntu-22.04p1-gpu" (or whatever you want to call it). Set the execution bit (chmod +x farm-ubuntu-22.04p1-gpu) and then just run it with ./farm-ubuntu-22.04p1-gpu.
#!/bin/bash # This will allocate a slurm job on a node with GPUS # and then drop you into an interactive bash shell on # that node. When you exit the shell, the job will be # released and the resources deallocated automatically. # # The "--nv" option to apptainer will automatically map # the host cuda installation into the container so that # the gpu and nvidia command line tools (e.g. nvidia-smi) # are available. # # To change any of the parameters you'll need to edit # this script. exec salloc --nodes=1 \ --ntasks-per-node=1\ --cpus-per-task=4\ --mem-per-cpu=2G\ --time=2:00:00\ --partition=gpu\ --gres=gpu:T4:1\ srun --pty \ apptainer shell \ -B /u,/group,/w,/work,/run \ --nv \ /scigroup/spack/mirror/singularity/images/epsci-ubuntu-22.04p1.img
Jupyter
This has the benefit of allowing VSCode to execute easily on the slurm farm and therefore run jupyter notebooks in a container on an allocated farm resource with GPU support. This can be better than running on an ifarm node since large jobs can take up a lot of resources there which is frowned upon. There are a couple of downsides of this method compared to running jupyter on the ifarm:
- The slurm resources parameters are hardcoded here so if you need to change them, you need to edit the script (or modify it to take those parameters itself)
- The slurm jobs have a lifetime and will die automatically when it runs out where ifarm jobs and basically run forever.
- Every time you (re)connect via VSCode this way, it will allocate another job, even if another one is still running.
To use this with Jupyter in VSCode, add an entry to you .ssh/config on your local computer making sure to point to the location of the interactive script from above wherever you placed it on the ifarm. (See here for more details):
# Interactive SLURM job on JLab SciComp farm using epsci-ubuntu-22.04p1 apptainer with GPU Host epsci-ubuntu-22.04p1~farm-gpu HostName ifarm1802.jlab.org ProxyJump scilogin.jlab.org RemoteCommand /group/epsci/apps/bin/farm-ubuntu-22.04p1-gpu RequestTTY yes