Login to SciComp GPUs
Jump to navigation
Jump to search
The following is how to use one of the ML scicomp machines that has 4 Titan RTX GPU cards installed.
Steps:
- Setting up the software environment seems to be more easily done using conda. We need to first log into jlab common environment with the below ssh command. ssh login.jlab.org
- We need to log into ifarm with the following ssh command ssh ifarm190X In 190X, X can either be 1 or 2.
- Setting up Python environment
- The software must be set up using a computer other than sciml190X since it needs a level of outside network access not available there.
- We recommend using Conda to manage your python packages and environments.
- Also, the size of the installation is large enough that it won't fit easily in you home directory. Conda likes to install things in ~/.conda so that must be a link to some larger disk.
- If ~/.conda already exists, please delete it since we are going to create a symbolic link named ~/.conda
- Create a folder in your work directory that can be linked to "~/.conda". For me, I created a folder named condaenv in "/work/halld2/home/kishan/". You can simply achieve this by running the following commands mkdir /work/<your hall>/home/<your name>/condaenv ln -s /work/<your hall>/home/<your name>/condaenv ~/.conda
- You can check if symbolic link is set up by running ls -ls you will see one of the entries as .conda -> /work/<your hall>/home/<your name>/condaenv
- Now run the following commands to load Anaconda3 and create a virtual environment named tf-gpu with tensorflow-gpu, cudatoolkit, keras and numpy installed. bash source /etc/profile.d/modules.sh module use /apps/modulefiles module load anaconda3/4.5.12 conda create -n tf-gpu tensorflow-gpu cudatoolkit keras numpy
- Activate the tf-gpu virtual environment. conda activate tf-gpu
- Reserving the GPUs
- To reserve 2 GPU cards salloc --gres gpu:TitanRTX:2 --partition gpu --nodes 1 srun --pty bash If you with to reserve n GPU nodes, change above command to gpu:TitanRTX:n
- Now activate your tf virtual environment by running below commands. source /etc/profile.d/modules.sh module use /apps/modulefiles module load anaconda3/4.5.12 conda activate tf-gpu
You'll be prompted to enter your Jlab account password.