Getting Started - Farm Jobs

From Xem2
Revision as of 13:41, 22 September 2021 by Ccotton (talk | contribs) (→‎Using hcswif)
Jump to navigationJump to search

Using the Jefferson Lab Farm

Documentation

This wiki is to be used as a conglomerate of resource links and practice. The documents here are not necessarily the most up-to-date, but it serves as a starting point for new users to get familiar with the JLab HPC environment and get some hands-on practice. Here is a list of useful information:

Farm Usage
Brad's famous JLab Compute Resources "How-to"
Farm Users Guide
Analyzer Information
Ole's 2019 Hall A/C Analyzer Software Overview
2018 Joint A/C Analysis Workshop
hcana docs

Overview

All current tasks in the XEM2 group require submitting many single-core jobs to the farm using either SWIF or AUGER. hcswif is used to submit replay jobs run-by-run to the farm nodes to run in parallel using SWIF (Outlined in the Farm Users Guide). Auger is used to submit multiple single-core jobs that do not need to access the tape library. This includes running multiple mc-single-arm instances, or running rc-externals with multiple cores. The following example(s) are in support of the XEM2 use case.

AUGER

  • Practice submitting stuff...

SWIF

  • Practice submitting stuff...

Using hcswif

hcswif is used to submit many analysis jobs based on run-number.

  1. cache.sh in u/group/c-xem2/software/hcswif/run-lists/tools to pull replays from tape to cache (using jcache)
    1. Sample usage: cache.sh

Troubleshooting

Common commands and difficulties with jobs

Common Failure Modes