Use JANA2 + GPU in singularity container

From epsciwiki
Revision as of 19:07, 16 April 2022 by Davidl (talk | contribs)
Jump to navigation Jump to search

Here are some instructions for building a Singularity container that can access NVidia GPU hardware.

These instructions for building the image were developed on a JLab CUE desktop that has /apps mounted. In principle, they can also be used on the ifarm machines, but some of the image files can be multiple GB in size which tends to be more of an issue. There are also some system packages that may need to be installed so having sudo privilege is useful. Once the image file is created, it can be moved to other computers and run there without issues.

First, make sure the squashfs tools are installed since singularity will needs these to create the image files:

 sudo yum install squashfs-tools

Next, setup to use singularity from the CUE via the /apps network mounted directory:

 module use /apps/modulefiles
 module load singularity/3.9.5

Create a singularity image from the official nvidia/cuda images. Here, I chose 11.4.2 because it is closest to the CUDA version on gluon200, which I will be targeting for the test. Note that the devel version is semi-large (~3GB) but includes the gcc 9.3.0 compiler.

 singularity pull docker://nvidia/cuda:11.4.2-devel-ubuntu20.04

This should leave you with a file named something like cuda_11.4.2-devel-ubuntu20.04.sif.

Copy the file to the computer with available GPUs and CUDA drivers installed and test it like this:

 singularity run -c --nv cuda_11.4.2-devel-ubuntu20.04.sif
 Singularity> nvidia-smi 
 Sat Apr 16 15:09:09 2022       
 +-----------------------------------------------------------------------------+
 | NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
 |-------------------------------+----------------------+----------------------+
 | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
 | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
 |                               |                      |               MIG M. |
 |===============================+======================+======================|
 |   0  Tesla T4            Off  | 00000000:01:00.0 Off |                    0 |
 | N/A   41C    P0    25W /  70W |      0MiB / 15109MiB |      0%      Default |
 |                               |                      |                  N/A |
 +-------------------------------+----------------------+----------------------+
 |   1  Tesla T4            Off  | 00000000:81:00.0 Off |                    0 |
 | N/A   40C    P0    25W /  70W |      0MiB / 15109MiB |      0%      Default |
 |                               |                      |                  N/A |
 +-------------------------------+----------------------+----------------------+
 |   2  Tesla T4            Off  | 00000000:A1:00.0 Off |                    0 |
 | N/A   43C    P0    26W /  70W |      0MiB / 15109MiB |      0%      Default |
 |                               |                      |                  N/A |
 +-------------------------------+----------------------+----------------------+
 |   3  Tesla T4            Off  | 00000000:C1:00.0 Off |                    0 |
 | N/A   42C    P0    25W /  70W |      0MiB / 15109MiB |      5%      Default |
 |                               |                      |                  N/A |
 +-------------------------------+----------------------+----------------------+
                                                                                
 +-----------------------------------------------------------------------------+
 | Processes:                                                                  |
 |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
 |        ID   ID                                                   Usage      |
 |=============================================================================|
 |  No running processes found                                                 |
 +-----------------------------------------------------------------------------+
 Singularity>


Note that the above comes from running the nvidia-smi executable from the host, but within the container.