Login to SciComp GPUs
The following is how to use one of the ML scicomp machines that has 4 Titan RTX GPU cards installed.
- Setting up the software environment seems to be more easily done using conda. We need to first log into jlab common environment with the below ssh command.
- We need to log into ifarm with the following ssh command
- Setting up Python environment (Needs to be done just once!)
You'll be prompted to enter your Jlab account password.
In 190X, X can either be 1 or 2.
- The software must be set up using a computer other than sciml190X since it needs a level of outside network access not available there.
- We recommend using Conda to manage your python packages and environments.
- Also, the size of the installation is large enough that it won't fit easily in you home directory. Conda likes to install things in ~/.conda so that must be a link to some larger disk.
- If ~/.conda already exists, please delete it since we are going to create a symbolic link named ~/.conda
- Create a folder in your work directory that can be linked to "~/.conda". For me, I created a folder named condaenv in "/work/halld2/home/kishan/". You can simply achieve this by running the following commands
mkdir /work/<your hall>/home/<your name>/condaenv ln -s /work/<your hall>/home/<your name>/condaenv ~/.conda
For me it is "ln -s /work/halld2/home/kishan/condaenv ~/.conda"
you will see one of the entries as .conda -> /work/<your hall>/home/<your name>/condaenv
bash source /etc/profile.d/modules.sh module use /apps/modulefiles module load anaconda3/4.5.12 conda create -n tf-gpu tensorflow-gpu cudatoolkit keras numpy
conda activate tf-gpu
- To reserve 2 GPU cards
salloc --gres gpu:TitanRTX:2 --partition gpu --nodes 1 srun --pty bash
salloc --gres gpu:TitanRTX:2 --partition gpu --nodes 1 --time=12:00:00 --mem=24GB
If you with to reserve n GPU nodes, change above command to gpu:TitanRTX:n
source /etc/profile.d/modules.sh module use /apps/modulefiles module load anaconda3/4.5.12 conda activate tf-gpu