Offline Analysis
Setting up the offline analysis
Production Replay Setup
Full replays are performed on the batch farm using the nps-ana account. The replays can be separated into three categories: SCALER, PRODUCTION, PRODUCTION ALL.
- The Scaler replays only have the TSH tree available.
- The Production replays have the main processed branches with reconstructed quantities.
- An exhaustive list of Production branches is listed in the adjoining wiki page [Production Branches and their meaning].
- NOTE
- For an updated list of branches, look at the nps_replay git repository.
- An exhaustive list of Production branches is listed in the adjoining wiki page [Production Branches and their meaning].
- The Production all replays have ALL possible NPS and HMS branches applied. These files are 1-to-1 with the EVIO segments.
- These files are rather large (tens of GB), so they are slow to process in CERN ROOT scripts without using RDataFrames.
- NOTE
- The primary reason these Production ALL segments are being replayed is to have access to the full waveform data. The Full waveform should only need to be replayed once in its raw form for all subsequent passes.
- These files are rather large (tens of GB), so they are slow to process in CERN ROOT scripts without using RDataFrames.
Check proper permissions
Insure you are a part of the NPS group and you have access to the /cache/ and /mss/ locations. You should also be a part of the Slurm users. [1] If you do not have these set up contact the Hall C compute coordinator.
Set up the appropriate directories
Get access and set up the following locations: /group/nps/$USER/ /volatile/nps/$USER/ROOTfiles /volatile/nps/$USER/REPORT_OUTPUT NOTE: There are multiple subdirectories you need to add. Make sure you have the relevant subdirectories based on the SCRIPTS you will run offline. For SCRIPS/NPS/replay_production_coin_NPS_HMS.C that is: [2] /work/hallc/nps/$USER
Setting SSH for GitHub at JLab
- Generate ssh key if you do not have one.
- ssh-keygen (when prompted for file in which to save the key and pass phrase, just hit return )
- Put ssh public key on Github
- Open "Settings" from the pull down menu in the top right.
- Go to "SSH and GPG keys"
- Click on "new ssh key" button. At terminal type "more ~/.ssh/id_rsa.pub". Copy the code and paste into github.
- Note: The ssh key may not work right away. If this happens, just log off from the iFarm machine and wait a few minutes for the key to start working. You may be unable to login to the iFarm for a few minutes, this is OK. Once you are able to login to the iFarm again the ssh key should be working, so you should be able to start cloning remote repositories.
Using the Jefferson Lab Farm
Running Production Replays on the Farm
You must run this on an ifarm computer.
- Navigate to your nps_replay group location.
- NOTE: If you do not have local version of nps_replay , go to the NPS_Software page and follow the setup instructions.
- Make sure the 'standard.kinematics' files are updated from CDAQ:
- Zheng has been updating the runlist and standard.kinematics files. These will be pushed on a daily basis to the nps_replay git repo.
- Bring your fork up-to-date with the JLab repo, git fetch and git rebase main. Make sure any offline work you do is on a separate branch and is committed frequently.
- Without this parameter file you will likely get an error 'no gpbeam in database!'.
- For first time use, set up the /cache, ROOTfiles, and REPORT_OUTPUT directories. /cache is the same symlink as on CDAQ. REPORT_OUTPUT is on your /volatile disk. Start with ROOTfiles also on volatile. Make sure you add all the subdirectories!!!
- TAR the nps_replay directory to be used on the farm:
- The nps_replay directory needs to be copied to the disk of a farm node via hcswif. Use the following command from the directory containing nps_replay :
cd nps_replay/ && tar -czf ../nps_replay.tar.gz . && cd -
- ls and you will see the nps_replay.tar.gz file in your current group directory. hcswif currently assumes the tar file is in your group directory as:
/group/nps/$USER/
- You have now created the nps_replay tar file with all the relevant replay parameters. This will be copied to the farm node along with the raw EVIO file.
- The nps_replay directory needs to be copied to the disk of a farm node via hcswif. Use the following command from the directory containing nps_replay :
- Specify which runs to replay.
- Determine the last replay on /mss with
ls /mss/hallc/c-nps/analysis/online/replays/
- Determine the last raw EVIO file available on /mss
ls /mss/hallc/xem2/raw/
- Write down the run range of interest
- Determine the last replay on /mss with
Now that we know which runs to replay and our nps_replay directory is set up we need to create the job. We use the hcswif.py script, which creates a JSON file that tells swif2 what jobs to run on the farm and how many resources to allocate to each one. Our version of hcswif has been updated to dynamically specify EVIO file size (space on farm computer). Each segment of a run is replayed and put in a separate file. All segments require the 0th segment. For this reason, ensure all the available first segments are on /cache.
jcache get /mss/hallc/c-nps/raw/nps_coin_1*.dat.0
To specify the runs to replay, the segment, and file size, a one liner has been made:
for file in /mss/hallc/c-nps/raw/nps_coin_1*.dat.*; do var=`echo $file | grep -o -E "[0-9]+"`; var2=`grep -oP "size=\K.*" $file`;echo $var $var2; done > runlist.dat
- Create a JSON file using hcswif
- Navigate to the common hcswif directory:
/u/group/nps/$USER/
- First time users should clone the nps_hcswif.
- Run
hcswif.py --help
to see a list of parameters to pass. Also, check out the README.md - Example
./hcswif.py --mode REPLAY --spectrometer NPS_COIN --run file run-lists/nps_rl1.dat --name NPS_COIN_9_20_23 --events -1 --account hallc
- This will produce a JSON output file with the name SHMS_PROD_12_17_22.json under the jsons directory in hcswif. It will be based off of the run-lists/nps_rl1.dat file which has a run, segment and file size specified line-by-line under the run-lists directory. Since the data is segmented, a reasonable walltime (set by --time) is 72000 seconds.
- Navigate to the common hcswif directory:
- Make sure you have appropriate /farm_out/ directories:
- Create these directories under /farm_out/$USER/
nps_replay_stderr
nps_replay_stdout
- Create these directories under /farm_out/$USER/
- From the same directory, submit the farm job:
- We can tell the farm what to run with the following swif2 command:
swif2 import -file jsons/NPS_COIN.json
- This will create the workflow, we will run the job with the command:
swif2 run NPS_COIN
- We can tell the farm what to run with the following swif2 command:
Now we will wait until the job finishes or fails! If the job fails ask for help.
Documentation
This wiki is to be used as a conglomerate of resource links and practice. The documents here are not necessarily the most up-to-date, but it serves as a starting point for new users to get familiar with the JLab HPC environment and get some hands-on practice. Here is a list of useful information: