Data Analysis - Starting Out

From Xem2
Jump to navigationJump to search

Starting Out

The purpose of this section is to give new members of the XEM group hands-on experience with 'hcana' and the 'replay'. When adding a new calibration or contributing to the hallc_replay_XEM git repository, new work should be added to a new_feature_branch and merged into your origin new_PASS_branch. This is followed by a push to mrcmor100 PASS_branch. Refer to the git help and software overview sections to familiarize yourself with this workflow.

Objectives

  1. Learn to set up your hallc_replay_XEM directory
    • Run the setup script to make /volatile/ locations
  2. Know what standard.kinematics and standard.database files do
  3. Learn to include parameters in hcana
    • Naming conventions of parameters in hcana
  4. Naming conventions of PARAM files in hallc_replay_XEM
  5. Run your first replay and explore the output
  6. Print the value of a parameter from hcana to the console

Setting up the Replay

First you must have forked the hallc_replay_XEM repository from mrcmor100. This is outlined in the getting started - software section.

  • In /group/c-xem2/<CUE_username>/PRACTICE/ disk location:

git clone git@github.com:<username>/hallc_replay_XEM.git

  • Username should be your github username, CUE_username is your JLab username.
  • Follow along with the git section under getting started to setup git with ssh on JLab computers.
  • cd into the hallc_replay_XEM directory
  • git submodule init
  • git submodule update
  • git fetch origin passN
  • git checkout passN

Now you should have all the submodules used in the hallc_replay_XEM directory and you should be on the passN branch. passN stands for the current pass. Check this by typing
git branch in the hallc_replay_XEM directory.

Setup Directories and links in the Replay

The output of hcana goes to two locations: ROOTfiles and REPORT_OUTPUT directories. These are saved on the /volatile/ disk to prevent using the /group/ location inappropriately. These directories are symbolically linked to the /volatile/ directories. The linked /volatile/ directory names can have any name you like, but the directory names in hallc_replay_XEM must be ROOTfiles and REPORT_OUTPUT. To create these directories, run the xem_setup.sh script in the top directory of your hallc_replay_XEM. Use the help feature if you are unsure of the usage.

  • ./xem_setup.sh -f volatile

The output is further separated into more subdirectories to separate files for the calibration, production, and spectrometer type. These are as follows:

  1. ROOTfiles/SHMS/
    • CALIBRATION
    • PRODUCTION
    • TIMING
    • SCALERS
  2. ROOTfiles/HMS/
    • CALIBRATION
    • PRODUCTION
    • TIMING
    • SCALERS
  3. REPORT_OUTPUT/SHMS/
    • CALIBRATION
    • PRODUCTION
    • TIMING
    • SCALERS
  4. REPORT_OUTPUT/HMS/
    • CALIBRATION
    • PRODUCTION
    • TIMING
    • SCALERS

The CALIBRATION directories are used for storing the output of calibration SCRIPTS. The TIMING directories are used to hold the output of the timing SCRIPTS in SCRIPTS/TIMING, etc. These calibrations, timing, production, and scaler scripts are explained in more detail on other pages of the data analysis pages.

Notice the hallc_replay_XEM was also pointed to our raw data directory under CACHE_LINKS. Check under the CACHE_LINKS directory with ls -l CACHE_LINKS. You will see severl symbolic links to raw data locations were created.

  • /cache/mss/hallc/spring17/raw raw-sp18
  • /cache/mss/hallc/jpsi-007/raw raw-sp19
  • /cache/mss/hallc/c-xem2/raw raw

These locations are the front face of the tape library. Files may not always be available here, and must be retrieved by the jcache utility. For more information refer to the scicomp documentation and the file structure and farm sections of getting started to use the jcache utility.

Standard dot what?

Figure 1: Example of runs in standard.kinematics for the SHMS in singles mode.

The standard.kinematics file tells the analyzer information about the beam energy, particle, spectrometer angle, target mass, and central momentum. The p and g are used to denote SHMS (p) and global(g) variables. If this were the HMS standard.kinematics file, the prefix p would be changed to h (HMS). There are two main kinematics files used for the XEM experiment, one for each spectrometer.

  • DBASE/HMS/standard.kinematics
  • DBASE/SHMS/standard.kinematics

The notation for run number range is common throughout all PARAM files for hcana. Some common notation is below:

  • A single run is typed before the select parameters for that run.
  • A dash - between two runs is the run range over which the parameters are applied.
  • The file is loaded from top to bottom, so parameters at the beginning are overwritten by later parameters.
    • This also applies to loading a file with parameters of the same name. PARAM files loaded later will overwrite previous parameters. Refer to the viewing hcana parameter section for an example.
  • The # sign indicates a comment that is not read by hcana.

Note: "#include" is a special set of characters that hcana recognizes.
Aside from target and beam-specific parameters, hcana also needs to take in information about the detector geometry, data cuts, and detector calibrations. These parameters are loaded in from the standard.database file.

  • DBASE/HMS/standard.database
  • DBASE/SHMS/standard.database
Figure 2: standard.database file, which contains standard.kinematics and other detector-related files by name. Parameters are loaded based on their filename.

Several strings are defined at the head of standard.database. These strings given in quotes are the names of files to load separately into hcana. Note that the spaces have no meaning in the input to hcana parameters, and the run-ranges above could have been listed as '2484 - 2488'. Notice in this file there are three separate run ranges. Since the X>1 data has been taken in two separate time periods (and a third to be held in 2022), there are different PARAM files to load for different configurations. Generally, in the X>1 and EMC analysis, parameters relevant for spring of 2018 have the sp18 suffix. Files associated with spring 2019 sp19 suffix. Some parameters did not change between 2018 and 2019, and are not expected to change between now and the 2022 running. Those files do not have a suffix.
Refer to the general parameters and spectrometer general parameters for information on parameters contained within these files.
Brief description of the contents of each filename imported via the standard.database file.

  • g_ctp_parm_filename - General parameters. typically detector geometry and nominal acceptance cuts.
  • g_ctp_det_calib_filename - default calibrations for each detector.
  • g_ctp_bcm_calib_filename - Default scaler calibrations for specific run periods. Most of these are overwritten in specific run ranges based on beam current calibrations. Refer to the BCM calibration section.
  • g_ctp_optics_filename - A set of matrix elements for reconstruction of events at the target. These are general parameters, but more specific ones are typically loaded for the spring 2018 runs due to the optics issues encountered. Refer to the optics section of data analysis.
  • g_ctp_map_filename - This is an important map of all signals in the readout controllers (ROCs). This can change from period-to-period, but typically doesn't change. This does not change within run periods.
  • g_ctp_trig_config_filename - This file typically doesn't change from period-to-period, but with the trigger changes between 2018 and 2019 ours did. This was to remove some trigger legs and add some helicity information.
  • g_ctp_template_filename - This is to load specific template files used to form the REPORT_OUTPUT/*.report files. This should remain the same within a run period so that report files do not change over the course of the run. More relevant branches have been added over time making these report files more useful.

HCANA Naming Conventions

hallc_replay_XEM is a framework of calibration files, replay scripts and database files that point the hallc analyzer (hcana) to the appropriate files. Some of the recognized naming in hcana has been outlined in the previous section. Those are recapped here and more information is added. The name of parameters must match that of the source code. You can ensure this is true by 'grepping' the source code of hcana for the variable name (without the p,g, or h prefix).

  • Parameters typed after a run number and before another run number are applied to that specific run number.
  • A dash - between two runs is the run range over which the following parameters are applied.
  • The file is loaded from top to bottom, so parameters at the beginning are overwritten by later parameters.
  • Generally, parameters begin with a string and are separated by an equals sign.
    • Spaces do not matter here.
    • Variables are loaded in using the DBRequest functionailty in the code of hcana. In this way, the variable type is given. hcana will typically tell you if it is trying to load a decimal number into an integer.
    • Arrays of variables can span multiple lines and are separated by commas. hcana typically knows how large of an array to expect, so it may give you an error or warning while running if the array is not the correct size.
  • This also applies to loading a file with parameters of the same name. PARAM files loaded later will overwrite previous parameters. Refer to the viewing hcana parameter section for an example.
  • The # sign indicates a comment that is not read by hcana and can be used on the same line as a parameter specification.
  • The ; also indicates a comma and can be used on the same line as a parameter specification.
  • The "#include" is a special set of characters that hcana recognizes. hcana will automatically load the contents of that PARAM file if found.

Naming Conventions for PARAM files

The desired kinematics have already been specified for the EMC and X>1 experiments. Some minor changes may be applied, but for the most part they are not changing. Different angle settings and momentum settings are used to target different physics. Detector calibrations, such as the calorimeter and DC calibrations, change with these kinematic changes. For this reason, we name our calibration PARAM files based on the kinematic setting they apply to.
For example, all the SRC runs in 2018 were taken at 8 degrees and 9.8GeV central momentum. We thus name our calibration files as we normally would, but before the sp18 suffix, we add the 8deg_m9p8_sp18. For the SHMS calorimeter calibration this would then be: PARAM/SHMS/CAL/pcal_calib_8deg_m9p8_sp18.param

  • The SHMS calorimeter p (for SHMS) followed by the angle and momentum setting and the run period suffix. Here the decimal has been replaced with a p and the m indicates minus momentum. This is to distinguish between positron CSB settings and regular data taking.

There were no conflicts with this system for 2018 and 2019 running. This being the case, the standard.database is read per-run-number. If there was a problem with this nomenclature, as long as the files had different names, standard.database could sort it out. There is nothing special with this naming scheme other than the ease of finding calibration files for specific settings. This is especially useful when debugging an issue with a specific group of runs.

DBASE vs. PARAM locations:

  • There exists a DET directory for detector calibration pointing files in the DBASE location:

DBASE/SHMS/DET/*.param
DBASE/SHMS/DET/*.param
This area acts as a kinematic-specific database for associated PARAM files. For instance, a run group 2484-2488 will share the same calibrations for the drift chambers and calorimeter, but use a previous hodoscope calibration. By looking at this file, it will tell you that those runs "#include" specific calibrations for that run group.

Running a SCRIPT

Load the XEM software package by sourcing the correct setup file for your shell: /group/c-xem2/software/setup.csh /group/c-xem2/software/setup.sh This script will load all the relevant python, root, and hcana versions we are currently all using. Do not go rogue and deviate from this! If you need to add a new python package or change to a different version of hcana for comparison purpose Ask Casey to add the proper /software/ directory and setup script for you.

  • This is super important for us to continue to produce consistent results.
  • Most people will find it helpful to add this setup script to their ~/.cshrc or ~/.bash file in the home directory.


Now we can run hcana in the top of our hallc_replay_XEM directory. Note: I added hcana to the $PATH variable, so you should be able to run it from anywhere, but you can only run scripts from the top of hallc_replay_XEM. (All the directories are relative to this location) hcana
hcana [0] .L SCRIPTS/SHMS/PRODUCTION/replay_production_shms.C
hcana [1] replay_production_shms(2484,-1)
This shell will now analyze all the events. While it is running look over the messages it writes to the screen before it started counting analyzed events (you can scroll up).

  • You will find that a new file has been created in ROOTfiles/SHMS/PRODUCTION and REPORT_OUTPUT/SHMS/PRODUCTION.

Don't open these files until the analyzer has finished!

While the script is running, you can also familiarize yourself with the parts of the script: -Open the file in your favorite text editor -The script takes in the run number and number of events to analyze. Optionally you can keep the parameters blank and enter them using cin. -pathList is a vector that hcana will search through for the specific file. If none of those places has the file it will error. The file is typically not there if the file is not pinned. Notice the raw-sp18 and raw-sp19 directories. -ROOTfileNamePattern is by no surprise the relative directory from where hcana is being ran. Commonly people will try to run this from somewhere SCRIPT directory, which won't work. -Line 30 - Line 51 are used to LOAD the parameters in. Notice how the standard.database is added by string. Then subsequently strings are LOADED in. This is how hcana knows which parameters to use. -Line 51 to line 105 are for creating detectors based on classes in hcana. These are added to the THcHallCSpectrometer object. -The scaler events are created next. -Take note of the DEF-files here. DEF-files should be discussed here, but I haven't included it yet... -Finally the analyzer analyzes the run and outputs the report_output file and rootfile to their respective location.


Common issues:

  • The file shms_all_02484.dat is not found: Did you create a raw-sp18 directory?
  • The file shms_all_02484.dat is not found: The file is not loaded from tape. Use the jcache utility to get the file.

Making plots from the Replay

  • Calorimeter energy deposition
  • 2D plot of focal plane quantities
  • Some interesting kinematic variables like W or X.
  • Explain plotting from the command line and cuts.

Viewing an hcana parameter

Open hcana and copy SCRIPTS/SHMS/PRODUCTION/replay_production_shms.C from lines 30 to 51 (or just run a short replay and don't .q after) int runNumber = 2484;
Paste the information into hcana
gHcParms->PrintFull("pcal_arr_gain_cor")

If you check the PARAM/SHMS/CAL/pcal_calib_sp18.param, you will find the same variable name with all the same values. This can be used for any hcana parameter, and should be used to debug changes if needed.

In addition, if you want to look through all hcana parameters, you can simply type: gHcParms->PrintFull("")