Jupyter via VSCode remote-ssh with singularity on ifarm

From epsciwiki
Jump to navigation Jump to search


Here are instructions for configuring your local VSCode to connect to the ifarm via ssh and run an ipython kernel inside a singularity container.

Get a smartcard

If you don't already have one, it is well worth stopping by the helpdesk and getting a smartcard USB device. It is not strictly required, but can save a lot of hassle typing numbers you get from your MFA app on your phone.

Configure SSH on your local computer

Logging into an ifarm computer requires first logging into the scilogin.jlab.org gateway. Configuring your local computer to do a proxy jump will make this a lot easier. Instructions for this can be found in the JLab knowledge base article here:

https://jlab.servicenowservices.com/kb?id=kb_article_view&sysparm_article=KB0014918&sys_kb_id=862c54221bf0d510a552ed3ce54bcb1a&spa=1

Follow those instructions so that "ssh ifarm" works from your local computer.

Next, add the following to your ~/.ssh/config file. This configures it so when you ssh to the special host "epsci-ubuntu-22.04~ifarm" it will not only tunnel you all of the way into the ifarm, but will run the specificed singularity container and drop you into it.

 # https://github.com/microsoft/vscode-remote-release/issues/3066#issuecomment-1019500216
 #
 Host epsci-ubuntu-22.04~ifarm
   HostName ifarm.jlab.org
   ProxyJump scilogin.jlab.org
   RemoteCommand singularity shell --bind /w,/work /cvmfs/oasis.opensciencegrid.org/jlab/epsci/singularity/images/epsci-ubuntu-22.04.img
   RequestTTY yes

Test that this works by doing:

  ssh epsci-ubuntu-22.04~ifarm

This should result in a "Singularity>" prompt that you can confirm is the correct OS by looking at either the /etc/os-release file or the Dockerfile in the /container directory.

Note that the "--bind /w,/work" option mounts the work disks inside the container so you can use them. Add any other directories (e.g. /group) you may need (Note that your home directory is automatically mounted).

Using work directories for large software installations

It is easy to overfill the quota on your CUE home directory by having VScode extensions and python environments install lots of packages. Redirecting these to the work disk can save a lot of headache. There are actually a few ways to deal with this depending on which python kernel/environment you select.

Using a global environment

This is not necessarily t

It is best to deal with this now by creating symbolic links in you home directory that point to directories on the work disk where these packages can be installed.

There are a couple of directories to be concerned with. Here are some commands to execute on the ifarm, that will set up the appropriate links. Replace "epsci" with the name of whatever work disk is appropriate. Note that if these directories already exist in your home directory, you may want to rename them or remove them to make way for this method.

 mkdir -p /work/epsci/${USER}/home_dot_local
 mkdir -p /work/epsci/${USER}/home_dot_cache
 mkdir -p /work/epsci/${USER}/home_dot_vscode-server
 ln -s /work/epsci/${USER}/home_dot_local ~/.local
 ln -s /work/epsci/${USER}/home_dot_cache ~/.cache
 ln -s /work/epsci/${USER}/home_dot_vscode-server ~/.vscode-server

The above will install any VScode extensions on the remote system (i.e. ifarm) into the /work/epsci/${USER}/home_dot_vscode-server directory.

Python packages installed using pip while in a jupyter session (not necessarily when using VSCode) will get installed into the /work/epsci/${USER}/home_dot_local ~/.local and /work/epsci/${USER}/home_dot_cache ~/.cache directories.

VSCode lets you specify a python virtual environment on a workspace-by-workspace basis so to make sure those end up on the work disk will take another couple of steps (see "Customized Python Virtual Environment" section below).

Configuring VScode

In VSCode

  1. Open the command palette using Cmd+shift+P or from the gear menu in the bottom left of the window
  2. Type "settings.json" and then select Preferences: Open User Settings (JSON)
  3. Add the following to your settings (if you have other settings already, you may need to add a comma to the line before this one!):
 "remote.SSH.enableRemoteCommand": true

Click on the "Remote Explorer" extension icon on the left side of the window (monitor with a circle in lower right) to open. You should see the "epsci-ubuntu-22.04~ifarm" item. Hover over it to see options to either connect the current window (arrow) or open a new window (box). Choose either option to get a connected window. Watch for a small entry box at the top of the window that is asking for your password. Enter your PIN+OTP to login. (OTP=One Time Password from either your smartcard or phone app).

It will automatically install some extensions on the remote host under ~/.vscode-server.

You can verify that it is working correctly by opening a new terminal in VSCode and seeing that it gives the "Singularity>" prompt.

Customized Python Virtual Environment

VSCode will provide kernel options to use with Jupyter which correspond to different python environments. You can create new virtual environments through the VSCode interface. When you do this, it will create a directory called .venv in the current workspace directory the VSCode window is using. Installing many of the standard python packages (tensorflow, pytorch, pandas, matplotlib, etc....) will take up several GB of space. It is better if this can be stored once and all of your workspaces use it.

The easiest way to do this is to

  1. create a workspace with a new Jupyter notebook
  2. create a new python virtual environment (this will create a .venv directory in the workspace directory)
  3. install the python packages via the VSCode Jupyter interface
  4. move the .venv directory to a central location and make a symbolic link pointing to it workspace directory

For the second step above, you can run a cell in the Jupyter notebook with these contents:

  %pip install pandas numpy matplotlib tensorflow torch

Open Jupyter notebook and select kernel

You should now be able to navigate the remote system in VSCode to either open an existing notebook, or create a new. Once you do, the "Select Kernel" option will be available in the top right corner of the VScode window. The first time you do this, it will have an option at the top of the window to "Install suggested extensions Python + Jupyter". Select this to install those extensions on the remote system.

Once the remote extensions are installed, click on "Select Kernel" again and a menu with different options will appear at the top of the window. Select the "Python Environments..." option.