Jupyter via VSCode remote-ssh with singularity on ifarm
Here are instructions for configuring your local VSCode to connect to the ifarm via ssh and run an ipython kernel inside a singularity container.
Get a smartcard
If you don't already have one, it is well worth stopping by the helpdesk and getting a smartcard USB device. It is not strictly required, but can save a lot of hassle typing numbers you get from your MFA app on your phone.
Configure SSH on your local computer
Logging into an ifarm computer requires first logging into the scilogin.jlab.org gateway. Configuring your local computer to do a proxy jump will make this a lot easier. Instructions for this can be found in the JLab knowledge base article here:
Follow those instructions so that "ssh ifarm" works from your local computer.
Next, add the following to your ~/.ssh/config file. This configures it so when you ssh to the special host "epsci-ubuntu-22.04~ifarm" it will not only tunnel you all of the way into the ifarm, but will run the specificed singularity container and drop you into it.
# https://github.com/microsoft/vscode-remote-release/issues/3066#issuecomment-1019500216 # Host epsci-ubuntu-22.04~ifarm HostName ifarm.jlab.org ProxyJump scilogin.jlab.org RemoteCommand singularity shell --bind /u,/group,/w,/work /cvmfs/oasis.opensciencegrid.org/jlab/epsci/singularity/images/epsci-ubuntu-22.04.img RequestTTY yes
Test that this works by doing:
ssh epsci-ubuntu-22.04~ifarm
This should result in a "Singularity>" prompt that you can confirm is the correct OS by looking at either the /etc/os-release file or the Dockerfile in the /container directory.
Note that the "--bind /u,/group,/w,/work" option mounts the group work disks inside the container so you can use them. Add any other directories (e.g. /cvmfs) you may need (Note that your home directory is automatically mounted).
Unsupported ssh option "remotecommand"
If you get an error about "remotecommand" it is likely due to the openssh version being too old. You need at least openssh7.6 (RHEL7.9 comes with openssh7.4). If you are using a Linux computer and have sudo access here is a quick solution:
curl -O https://mirror2.sandyriver.net/pub/OpenBSD/OpenSSH/portable/openssh-9.3p1.tar.gz tar xzf openssh-9.3p1.tar.gz cd openssh-9.3p1 ./configure make -j8 sudo make install
This will install the new version of openssh into /usr/local. If you are using tcsh you will need to execute "rehash" in any open terminals (new terminals will be OK).
Try running the ssh epsci-ubuntu-22.04~ifarm command again to make sure it works.
Using work directories for large software installations
It is easy to overfill the quota on your CUE home directory by having VScode extensions and python environments install lots of packages. Redirecting these to the work disk can save a lot of headache. There are actually a few ways to deal with this depending on which python kernel/environment you select.
Using a global environment
Both VSCode and JLab JupyterHub may install python packages in hidden directories in your home directory. These count against your quota and can easily fill it sneakily without you knowing until it is too late. It is best to deal with this now by creating symbolic links in you home directory that point to directories on the work disk where these packages can be installed.
There are a couple of directories to be concerned with. Here are some commands to execute on the ifarm, that will set up the appropriate links. Replace "epsci" with the name of whatever work disk is appropriate. Note that if these directories already exist in your home directory, you may want to rename them or remove them to make way for this method.
mkdir -p /work/epsci/${USER}/home_dot_local mkdir -p /work/epsci/${USER}/home_dot_cache mkdir -p /work/epsci/${USER}/home_dot_vscode-server ln -s /work/epsci/${USER}/home_dot_local ~/.local ln -s /work/epsci/${USER}/home_dot_cache ~/.cache ln -s /work/epsci/${USER}/home_dot_vscode-server ~/.vscode-server
The above will install any VScode extensions on the remote system (i.e. ifarm) into the /work/epsci/${USER}/home_dot_vscode-server directory.
Python packages installed using pip while in a JLab Jupyter session (not necessarily when using VScode) will get installed into the /work/epsci/${USER}/home_dot_local ~/.local and /work/epsci/${USER}/home_dot_cache ~/.cache directories.
VScode lets you specify a python virtual environment on a workspace-by-workspace basis so to make sure those end up on the work disk will take another couple of steps (see "#Customized Python Virtual Environment" section below).
Configuring VScode
In VSCode
- Open the command palette using Cmd+shift+P or from the gear menu in the bottom left of the window
- Type "settings.json" and then select Preferences: Open User Settings (JSON)
- Add the following to your settings (if you have other settings already, you may need to add a comma to the line before this one!):
"remote.SSH.enableRemoteCommand": true
Click on the "Remote Explorer" extension icon on the left side of the window (monitor with a circle in lower right) to open. You should see the "epsci-ubuntu-22.04~ifarm" item. Hover over it to see options to either connect the current window (arrow) or open a new window (box). Choose either option to get a connected window. Watch for a small entry box at the top of the window that is asking for your password. Enter your PIN+OTP to login. (OTP=One Time Password from either your smartcard or phone app).
It will automatically install some extensions on the remote host under ~/.vscode-server.
You can verify that it is working correctly by opening a new terminal in VSCode and seeing that it gives the "Singularity>" prompt.
Customized Python Virtual Environment
VSCode will provide kernel options to use with Jupyter which correspond to different python environments. You can create new virtual environments through the VSCode interface. When you do this, it will create a directory called .venv in the current workspace directory the VSCode window is using. Installing many of the standard python packages (tensorflow, pytorch, pandas, matplotlib, etc....) will take up several GB of space. It is better if this can be stored once and all of your workspaces use it.
The easiest way to do this is to
- create a workspace with a new Jupyter notebook
- create a new python virtual environment (this will create a .venv directory in the workspace directory)
- install the python packages via the VSCode Jupyter interface
- move the .venv directory to a central location and make a symbolic link pointing to it workspace directory
For the second step above, you can run a cell in the Jupyter notebook with these contents:
%pip install pandas numpy matplotlib tensorflow torch
Open Jupyter notebook and select kernel
You should now be able to navigate the remote system in VSCode to either open an existing notebook, or create a new. Once you do, the "Select Kernel" option will be available in the top right corner of the VScode window. The first time you do this, it will have an option at the top of the window to "Install suggested extensions Python + Jupyter". Select this to install those extensions on the remote system.
Once the remote extensions are installed, click on "Select Kernel" again and a menu with different options will appear at the top of the window. Select the "Python Environments..." option.
Using a Julia Notebook
Jupyter supports many languages, including julia. You can configure VSCode to use use a Julia interpreter in a Jupyter notebook.
Step 1: Install the Julia extension in the remote VSCode server Go to a VSCode window that is connected to the "SSH: epsci-ubuntu-22.04~ifarm" host (or whichever remote host you wish to use). Click on the extensions icon on the left of the window and type "Julia" at the top to find the Julia extension and install it. Note that the Jupyter extension should already be installed on the remote system via the instructions in previous sections.
Step 2: Configure VSCode to find the Julia interpreter. Do this by setting the "julia.executablePath" on your remote system:
- Open command palette (Cmd+shift+P)
- Type "settings" and select "Preferences: Open Remote Settings (JSON) (SSH: epsci-ubuntu-22.04~ifarm)"
- Add the julia.executablePath setting it to the full path to the julia executable. Here is an example. Note that this also includes the path to my python venvs used with Jupyter.
{ "python.venvPath":"/work/epsci/davidl/python_venvs", "julia.executablePath":"/group/epsci/apps/Julia/julia-1.9.1/bin/julia" }
Note that you can install your own Julia binaries and point to them instead.
Step 3: Make a symlink so Julia packages install to a work disk instead of your home directory. This saves filling your home directory quota as described above for python. Below is how I did it. Adjust to the work disk location of your preference.
mkdir /work/epsci/${USER}/home_dot_julia ln -s /work/epsci/${USER}/home_dot_julia ~/.julia
Step 4: Create a Julia Jupyter notebook. At this point if you select "New File..." from the VSCode window you should see a new option to create a new Julia file. If you don't, then try disconnecting the VSCode remote-ssh session and reconnecting.
If you want to use Jupyter, then don't create a Julia file, but instead select "Jupyter Notebook .ipynb support". This will create a new Jupyter notebook in which you can select to use either the "Julia 1.9.1" kernel or one of your python kernels.
NOTE: It seems the ".ipynb" extension is the only one allowed by the VSCode Jupyter extension, even if the notebook is a Julia notebook!
Below is some example Julia code that you can paste into a cell and execute. The first time you run this, the package installs will take quite a while. (It took >8min for me). At first, it will appear to fail with Download errors. It looks like when this happens, it reverts to download source and compiling that which is probably why it takes so long. (We should ask the CC to whitelist pkg.julialang.org which will probably give access to precompiled binaries.)
import Pkg; Pkg.add("Plots") import Pkg; Pkg.add("GR") using Plots # plot some data plot([cumsum(rand(500) .- 0.5), cumsum(rand(500) .- 0.5)]) # save the current figure savefig("plots.svg") # .eps, .pdf, & .png are also supported