Difference between revisions of "SPACK Mirror on JLab CUE"
Line 52: | Line 52: | ||
|} | |} | ||
− | == CVMFS == | + | == CVMFS Client Configuration == |
If you are working on the JLab ifarm computers than CVMFS is already installed and configured. This is nothing else you need to do. CVMFS may also already be available on many remote HPC sites (e.g. NERSC). Check the site's specific documentation or simply look for the /cvmfs/oasis.opensciencegrid.org directory. | If you are working on the JLab ifarm computers than CVMFS is already installed and configured. This is nothing else you need to do. CVMFS may also already be available on many remote HPC sites (e.g. NERSC). Check the site's specific documentation or simply look for the /cvmfs/oasis.opensciencegrid.org directory. | ||
Line 144: | Line 144: | ||
<hr> | <hr> | ||
= Administration of the SPACK Respository= | = Administration of the SPACK Respository= | ||
− | The following sections describe various aspects of creating and managing the JLab SPACK repository. There are a number of choices that were made in how this was set up so this | + | The following sections describe various aspects of creating and managing the JLab SPACK repository. There are a number of choices that were made in how this was set up so this documents those since they may not all be obvious by simply looking at directory structures and config. files. |
Line 152: | Line 152: | ||
# Packages are built using singularity containers | # Packages are built using singularity containers | ||
#* Containers bind the ''/scigroup/cvmfs'' subdirectory to be at ''/cvmfs/oasis.opensciencegrid.org/jlab'' inside the container | #* Containers bind the ''/scigroup/cvmfs'' subdirectory to be at ''/cvmfs/oasis.opensciencegrid.org/jlab'' inside the container | ||
− | #* This allows absolute paths that start with /cvmfs to be | + | #* This allows absolute paths that start with /cvmfs to be writable in the build/install process |
#* The ''/scigroup/cvmfs/epsci'' directory is exported to CVMFS so it can be mounted read-only from anywhere | #* The ''/scigroup/cvmfs/epsci'' directory is exported to CVMFS so it can be mounted read-only from anywhere | ||
− | # The | + | #* The export is done every 4 hours via cronjob. Thus, newly built packages will not be immediately accessible. |
+ | #** Wes Moore set this up and can increase frequency if needed. | ||
+ | # A separate spack repository is maintained for every platform (e.g. centos/7.7.1908 is separate from centos/8.0.2011) | ||
+ | #* This was a choice made on our end to segregate the binaries and make it easier to add and drop support for platforms in the future. | ||
+ | # Users will access the software via the /cvmfs directory. | ||
+ | #* The SciComp computers (e.g. ifarm1901) all mount /cvmfs | ||
+ | #* Users can also install the CVMFS client on their personal laptop or desktop to access the software. | ||
# The packages are exported to a ''build cache'' accessible from https://spack.jlab.org/mirror | # The packages are exported to a ''build cache'' accessible from https://spack.jlab.org/mirror | ||
− | #* | + | #* We do this only because it is simple and doesn't cost us anything significant. We discourage its use and may remove it in the future. |
=== Creating a new Singularity Image === | === Creating a new Singularity Image === |
Revision as of 12:25, 10 March 2021
Using the JLab SPACK Repository
Overview
SPACK is a package manager used to maintain multiple versions of software compiled with various compilers for various OSes. The EPSCI group takes the primary responsibility for maintaining the SPACK repository at JLab. SPACK has a rich feature set that allows a lot of flexibility in how one can use it to manage their software. This page describes details of how SPACK is implemented at JLab for the ENP program.
There are three primary use cases for the software built with the SPACK system:
- Users on the JLab CUE want to use the pre-built binary versions on JLab computers
- Users running offsite want to use the pre-built binary versions on their local computers
- Users want to install the pre-built binaries on their local computer so they can run untethered
The first two of these are satisfied by using /cvmfs. The third use case uses a web accessible SPACK buildcache and is quite a bit more fickle. Officially, we do not support option 3 because of this.
Quickstart
The recommended way to set up your environment is with one of the following:
[bash] source /cvmfs/oasis.opensciencegrid.org/jlab/epsci/spack_env/spack_env.sh lmod gcc/9.3.0 [tcsh] source /cvmfs/oasis.opensciencegrid.org/jlab/epsci/spack_env/spack_env.csh lmod gcc/9.3.0
Note that the above may take a few seconds to complete, but it sets up a user-friendly package naming scheme for "module load". If you want quicker startup and are willing to live with package names that include long hashes, then source the script with no arguments:
[bash] source /cvmfs/oasis.opensciencegrid.org/jlab/epsci/spack_env/spack_env.sh [tcsh] source /cvmfs/oasis.opensciencegrid.org/jlab/epsci/spack_env/spack_env.csh
Other useful commands:
module avail # List available packages module load packagename # Load a package (optionally specify version number) module unload packagename # Unload a package that was previously loaded
The following operating systems are supported:
OS | support start date | support end date |
---|---|---|
centos/7.7.1908 | March 31, 2021 | current |
centos/8.3.2011 | March 31, 2021 | current |
ubuntu/21.04 | March 31, 2021 | current |
CVMFS Client Configuration
If you are working on the JLab ifarm computers than CVMFS is already installed and configured. This is nothing else you need to do. CVMFS may also already be available on many remote HPC sites (e.g. NERSC). Check the site's specific documentation or simply look for the /cvmfs/oasis.opensciencegrid.org directory.
To mount the public, read-only CVMFS volume that contains the pre-built binaries see the instructions in one of the following sections for your specific platform.
The most up to date instructions on installing and configuring the CVMFS client software can be found on the CVMFS website.
Linux
Here are instructions for installing on a CentOS or RedHat system (personal laptop or desktop)
1. Install the pointer to the CVMFS repo and then install cvmfs itself. After it is installed, generate a default config file.
sudo yum install https://ecsft.cern.ch/dist/cvmfs/cvmfs-release/cvmfs-release-latest.noarch.rpm sudo yum install -y cvmfs cvmfs_config setup
2. Create a config file /etc/cvmfs/default.local with the following content (you need to do this with sudo):
CVMFS_REPOSITORIES=oasis.opensciencegrid.org CVMFS_HTTP_PROXY=DIRECT CVMFS_CLIENT_PROFILE=single
3. Restart the autofs service
systemctl restart autofs
Mac OS X
To use CVMFS on Mac OS X, you need to install the MacFUSE package and then the cvmfs package. You should then reboot so everything will load properly. The step-by-step instructions follow.
1. Download and install the macFUSE package
2. Download and install the cvmfs package with the following (Note that downloading the cvmfs package via curl apparently avoids some signature security issue on Mac OS X that you would get if downloaded via web-browser. Don't ask me how.)
curl -o ~/Downloads/cvmfs-2.7.5.pkg https://ecsft.cern.ch/dist/cvmfs/cvmfs-2.7.5/cvmfs-2.7.5.pkg open cvmfs-2.7.5.pkg
3. Create a config file /etc/cvmfs/default.local with the following content (you need to do this with sudo):
CVMFS_REPOSITORIES=oasis.opensciencegrid.org CVMFS_HTTP_PROXY=DIRECT
4. Restart the computer
5. Create the mount point and mount oasis with:
sudo mkdir -p /cvmfs/oasis.opensciencegrid.org sudo mount -t cvmfs oasis.opensciencegrid.org /cvmfs/oasis.opensciencegrid.org
If it all works you should see something like this:
>sudo mount -t cvmfs oasis.opensciencegrid.org /cvmfs/oasis.opensciencegrid.org CernVM-FS: running with credentials 10000:10000 CernVM-FS: loading Fuse module... done CernVM-FS: mounted cvmfs on /Users/Shared/cvmfs/oasis.opensciencegrid.org
Docker
There are actually two options for using CVMFS inside a Docker container:
- Install CVMFS on the host and simply bind the /cvmfs directory to the same directory inside the container
- Run the CVMFS software inside the container and mount it there.
Option 1 is preferred since any caching of the files is done by the host and and so does not disappear when the container goes away. It also can be used with any image and does not require another image to be created with the CVMFS software installed. To implement option 1, first mount CMVFS on the host using the above instructions for your host platform. Then, when you start the container, give the docker command an argument of -v /cvmfs:/cvmfs.
Option 2 can be convenient if you have trouble getting CVMFS working on the host. There are actually two methods here. One is to use the pre-made Docker container as described in the CVMFS documentation. You may create an image based on this or even use it as-is to supply /cvmfs to the host and then use option 1 above.
The second method is to create a new image from scratch containing the necessary software. This method has worked in the past, though there may be easier ways of doing it today. Here are the instructions though in case all of the other above methods fail.
Unfortunately, there are a couple of steps that cannot be done when the image is created and must be implemented when the container is created. A working example with some comments can be found here:
https://github.com/faustus123/hdcontainers/tree/master/Docker_cvmfs
Running untethered (no CVMFS)
Running untethered means installing the packages on you local computer so you can still run the software even with no internet connection. It is stated up front that this is unlikely to work for numerous reasons, but for those who like punishing themselves, here is some info that may help get you going. It goes without saying that none of this is recommended.
The main issue with installing locally is that many packages build their installation paths into their installed scripts and binaries. While spack does have mechanisms to try and fix this, they can fail if the directory path is either too long or too short. Your best chances of success will come if you create a local directory path that matches exactly what it would be if /CVMFS were mounted. Here are some example instructions:
mkdir -p /cvmfs/oasis.opensciencegrid.org/jlab/epsci/centos/ git clone --depth 1 https://github.com/spack/spack.git /cvmfs/oasis.opensciencegrid.org/jlab/epsci/centos/7.7.1908 source /cvmfs/oasis.opensciencegrid.org/jlab/epsci/centos/7.7.1908/share/spack/setup-env.sh # or setup-env.csh spack mirror add jlab-public https://spack.jlab.org/mirror spack install -f -o -u clhep # This should install "CLHEP" locally using the pre-built binaries
Administration of the SPACK Respository
The following sections describe various aspects of creating and managing the JLab SPACK repository. There are a number of choices that were made in how this was set up so this documents those since they may not all be obvious by simply looking at directory structures and config. files.
Organizational Overview
The organization of the spack binaries is as follows:
- Packages are built using singularity containers
- Containers bind the /scigroup/cvmfs subdirectory to be at /cvmfs/oasis.opensciencegrid.org/jlab inside the container
- This allows absolute paths that start with /cvmfs to be writable in the build/install process
- The /scigroup/cvmfs/epsci directory is exported to CVMFS so it can be mounted read-only from anywhere
- The export is done every 4 hours via cronjob. Thus, newly built packages will not be immediately accessible.
- Wes Moore set this up and can increase frequency if needed.
- A separate spack repository is maintained for every platform (e.g. centos/7.7.1908 is separate from centos/8.0.2011)
- This was a choice made on our end to segregate the binaries and make it easier to add and drop support for platforms in the future.
- Users will access the software via the /cvmfs directory.
- The SciComp computers (e.g. ifarm1901) all mount /cvmfs
- Users can also install the CVMFS client on their personal laptop or desktop to access the software.
- The packages are exported to a build cache accessible from https://spack.jlab.org/mirror
- We do this only because it is simple and doesn't cost us anything significant. We discourage its use and may remove it in the future.
Creating a new Singularity Image
For the purposes of this system, the Singularity images used for building packages are derived from Docker images. This ensures that either Docker or Singularity can be used to build packages with spack. Thus, if someone needs to build another package, they can choose the container system most convenient for them. Docker images are posted on Docker Hub where Singularity can easily pull them. (Docker images cannot be easily created from Singularity images.)
The Dockerfiles used to create the Docker images are kept in the git-hub repository "epsci-containers". They are also copied into the image itself so one can always access the Dockerfile used to create an image via /container/Dockerfile.*. The Docker images are created with a few system software packages installed. Mainly a C++ compiler, version control tools (e.g. git and svn), python, and a couple of other tools needed for building packages. Below is an example of a Dockerfile (click right-hand side to view).
EXAMPLE Dockerfile. (Click "Expand" to the right for details -->):
To create a singularity image, one first needs to create a Docker image. Thus, one needs access to a computer with Docker installed. This generally needs to be a personal desktop or laptop since Docker requires root access and is therefore not available on the public machines like ifarm. (Incidentally, singularity also requires root privileges in order to build an image from a recipe, but not if just pulling from an existing Docker image). Here is example of the steps you might go through if creating an image for a new version of ubuntu. This assumes you are starting on a computer with Docker installed and running.
- git clone https://github.com/JeffersonLab/epsci-containers
- cd epsci-containers/base
- cp Dockerfile.ubuntu.21.04 Dockerfile.ubuntu.18.04
- edit Dockerfile.ubuntu.18.04 to replace the version numbers with the new ones. They appear in a lot of places so better to do global replace
- docker build -t epsci-ubuntu:18.04 -t jeffersonlab/epsci-ubuntu:18.04 -f Dockerfile.ubuntu.18.04 .
- docker push jeffersonlab/epsci-ubuntu:18.04
- ssh ifarm
- module use /apps/modulefiles
- module load singularity
- cd /scigroup/spack/mirror/singularity/images
- singularity build epsci-ubuntu-18.04.img docker://jeffersonlab/epsci-ubuntu:18.04
- git clone https://github.com/spack/spack.git /scigroup/cvmfs/epsci/ubuntu/18.04
The last step above will clone a new spack instance that corresponds to the new image.
Building a spack package with a Singularity (or Docker) container
The preferred method of building new packages is to use one of the ifarm computers with a singularity container from the /scigroup/spack/mirror/singularity/images directory. Any packages built should also be exported to the build cache so they are accessible for offsite installations. Below is an example recipe that builds zlib for the ubuntu 21.04 platform using the native gcc10.2.1 compiler:
- ssh ifarm1901
- module use /apps/modulefiles
- module load singularity
- singularity shell -B /scigroup/cvmfs:/cvmfs/oasis.opensciencegrid.org/jlab -B /scigroup:/scigroup /scigroup/spack/mirror/singularity/images/epsci-ubuntu-21.04.img
- source /cvmfs/oasis.opensciencegrid.org/jlab/epsci/ubuntu/21.04/share/spack/setup-env.sh
- spack compiler find
- spack install zlib%gcc@10.2.1 target=x86_64
- cd /scigroup/spack/mirror
- spack buildcache create -r -a -u -d . zlib%gcc@10.2.1
- spack buildcache update-index -k -d /scigroup/spack/mirror
Be careful that the singularity image you use matches the spack root directory (i.e. where you source the set-env.sh script).
You also want to specify the x86_64 target so generic binaries are built that do not contain optimizations for specific processors.
Finally, don't forget to run the last two commands above to add the package to the build cache and to update the index.
Setting up a new platform
There are some pitfalls that are easy to fall into when trying to setup a new platform. Particularly if you want to build using a non-default compiler. Here are some steps that can be useful to get one set up from scratch. These assume a singularity image has been built and a spack clone for the specific OS already exists. These are for setting up a repository for centos 7.7.1908 that uses binaries built with the gcc 9.3.0 compiler.
# First, create a singularity container and setup the spack environment ssh ifarm1901 newgrp spack # this puts spack to the front of your group list so new files/directories below to it module use /apps/modulefiles module load singularity singularity shell -B /scigroup/cvmfs:/cvmfs/oasis.opensciencegrid.org/jlab -B /scigroup:/scigroup /scigroup/spack/mirror/singularity/images/epsci-centos-7.7.1908.img source /cvmfs/oasis.opensciencegrid.org/jlab/epsci/centos/7.7.1908/share/spack/setup-env.sh # At this point you want to disable the mirror and any other compilers # to ensure packages are all built for *this* spack environment. spack mirror rm jlab-public spack compilers # use spack compiler remove xxx for anything that is not the default system compiler # Build the gcc9.3.0 compiler using the default system compiler. # Load it and add it to the list of spack compilers spack install gcc@9.3.0%gcc@4.8.5 target=x86_64 spack load gcc@9.3.0 spack compiler find # Use the gcc9.3.0 compiler to build clhep and ROOT. # Note that in this case, root requires sqlite and the default # version (3.34.0) failed to fetch the source. Thus, I had to # build the previous version and specify that ROOT use it. spack install clhep%gcc@9.3.0 target=x86_64 spack load clhep spack install sqlite@3.33.0%gcc@9.3.0 target=x86_64 # default version 3.34.0 failed to fetch spack install root %gcc@9.3.0 target=x86_64 ^sqlite@3.33.0 target=x86_64 spack buildcache create -r -a -u -d . gcc@9.3.0 %gcc@4.8.5 arch=x86_64 spack buildcache create -r -a -u -d . clhep %gcc@9.3.0 arch=x86_64 spack buildcache create -r -a -u -d . root %gcc@9.3.0 arch=x86_64 ^sqlite@3.33.0 arch=x86_64 spack buildcache update-index -k -d /scigroup/spack/mirror
We would also like to access packages maintained in the eic-spack repo. To do this, the repo must be checked out and added to the repo list.
git clone https://github.com/eic/eic-spack.git ${SPACK_ROOT}/var/spack/repos/eic-spack spack repo add $SPACK_ROOT/var/spack/repos/eic-spack
Please see the next section on setting up the module system
Setting up the module system (LMOD)
We would like most users to be be able to interact with the spack packages using the standard "module load" command. Spack has nice support for this though there are options for how it is setup and we'd like to be consistent across supported platforms.
First off, we use the LMOD system as it supports hierarchical module files. This allows us to configure the system so that when a specific compiler is loaded, only packages corresponding to that compiler are listed. This should make it easier on the user to navigate and to avoid loading incompatible packages. We also configure it to present packages using the {package}/{version} naming scheme. This is what is used by /apps on the CUE which will make the spack packages integrate more seamlessly with those.
Here are instructions for setting this up.
Modules configuration file
The ${SPACK_ROOT}/etc/spack/modules.yaml configuration file must be created and have the following content added. This is mostly based on an example given in the spack documentation under "Hierarchical Module Files". Descriptions of the settings are given below.
modules: enable:: - lmod lmod: core_compilers: - 'gcc@4.8.5' hierarchy: - mpi hash_length: 0 whitelist: - gcc blacklist: - '%gcc@4.8.5' - 'arch=linux-centos7-zen2' all: filter: environment_blacklist: - "C_INCLUDE_PATH" - "CPLUS_INCLUDE_PATH" - "LIBRARY_PATH" environment: set: '{name}_ROOT': '{prefix}' projections: all: '{name}/{version}' ^lapack: '{name}/{version}-{^lapack.name}'
- The core_compilers section should list the system compiler and is the default.
- hash_length: 0 removes the spack hash from package names
- whitelist ensures all gcc compilers are available. (Once one of those is loaded, other packages will appear.)
- blacklist excludes packages built with the default system compiler (n.b. whitelist overrides this so other compilers will still be listed)
- the arch= blacklist line excludes packages built specifically with the zen2 microcode instead of generic x86_64. Those packages were actually built by mistake and may be removed altogether. This is a nice way though of obscuring them from view.
- The environment_blacklist filter was just copied from the spack example in the documentation. We may want to remove it. I did not recall build systems using those variables so just left it in.
- The environment: set: section adds an environment variable for every package that is {package}_ROOT so the root directory of the package can be easily obtained, even if the package itself does not define such a variable.
- The projections section defines the module naming scheme. The line for lapack was left in from the spack tutorial example.
Misc. Notes
Here are some miscellaneous notes on issues with getting some packages to build
package | solution | notes |
---|---|---|
ncurses | spack install ncurses+symlinks target=x86_64 | The ncurses package can fail to build due to permission denied errors related to /etc/ld.so.cache~. Internet wisdom says to build it with the symlinks option turned on. (See also notes on building using Mac OS X + Docker below) |
automake | build into spack directory with very short absolute path | This error happens at the very end of the build when it tries to run the help2man utility on the automake and aclocal scripts. The failure is because the scripts contain a shebang at the top with a path length longer than 128 characters. Spack actually has a fix for this that it will automatically apply after install. However, this help2man tool is run by the automake build system before that is run. To make the build succeed, use a spack root directory that has a very short path (e.g. by binding the host working directory to something like "/A" in the singularity container). Then, make sure to create the buildcache using the "-r" option so that it is relocatable. The buildcache can then be installed in any spack root directory, regardless of path length. |
Mac OS X + Docker
The default disk format for Mac OS X is non-case-sensitive. It automatically translates file and directory names to give the illusion that it is case sensitive. This works fine except when you have two files in the same directory whose name only differs by case. This becomes an issue if you are building spack packages for Linux using Docker and are doing so in a directory from the local disk (bound in the Docker container). I saw this with the ncurses package failing with errors related to E/E_TERM and A/APPLE_TERM (I may not be remembering the exact file names correctly).
One work-around is to create a disk image using Disk Utility and choose the format to be "Mac OS Extended (Case-sensitive, Journaled)". Mount the disk image and bind that to the docker container. This will give you a case sensitive persistent disk (i.e. survives after the container exits).
If you do not care about persistence, then just build in a directory in the Docker container's temporary file system. You can always save to a buildcache from there and copy just the buildcache file out of the container.