Difference between revisions of "General Particle Tracer"

From Ciswikidb
Jump to navigation Jump to search
 
(6 intermediate revisions by the same user not shown)
Line 2: Line 2:
  
 
GPT is a 3D particle pusher with (optional) space charge as well as spin tracking. It is easy to parametrize and supports all types of field maps we need for accelerator studies as long as our focus is not on things like wake fields, beam/matter interactions, radiation, etc.
 
GPT is a 3D particle pusher with (optional) space charge as well as spin tracking. It is easy to parametrize and supports all types of field maps we need for accelerator studies as long as our focus is not on things like wake fields, beam/matter interactions, radiation, etc.
 +
 +
== GPT on Windows ==
 +
 +
We have a pre-packaged GPT version for Windows that includes the Visual Studio dependency for custom elems/progs as well as Alicia's collection of elems and progs. This package, along with the the setup files for manual installation if you prefer, can be found at:
 +
\\jlabwin\SW_Dist\UserApps\GPT
 +
The license key, should you need it, can be found in the license agreement in the Documentation folder.
 +
 +
The most recent GPT version we have pre-packed on Windows is 3.52. An update to 3.55 is planned.
  
 
== GPT on Linux ==
 
== GPT on Linux ==
Line 17: Line 25:
 
All distributions are deployed with the following additions to the base version:
 
All distributions are deployed with the following additions to the base version:
 
* custom elements:
 
* custom elements:
** IONATOR 08/22/2022
+
** IONATOR 08/22/2022: ionator
 +
** Alicia's elements v3.52_2023_03_17: addxyrot ccsmove indexed_sectormagnet map1D_Brectmag map1D_B_scope map1D_TM_scope map25D_TM_scope map2D_B_scope map3D_B_scope map3D_Ecomplex_scope map3D_Hcomplex_scope map3D_TM_scope map3D_TMparmela map3D_TMparmela_scope map3D_wien mmap3D_B mmap3D_B_scope quad_fringe radialslitmask removedirection ringslit sectormagnet_fix setgdfvars setplateaut setsigma setsupergt TE011gauss_scope TE110gauss_scope TM010gauss_scope TM110gauss_scope TMrectcavity_scope quadrupole_scope wien wienfindB wienfindE writecGBx writeGB writeGBall writeGBallscreen writeGBcyl writeGBcylscreen writeGBscreen writeucGBx writep writepscreen writep_eVperc writep_eVpercscreen
 
* gdfa progs:
 
* gdfa progs:
** none
+
** Alicia's progs v3.52_2023_03_17: Q1 Q2 tutacc avgEk avgEk_eV avgEtotal avgEtotal_eV avgGBphi avgGBphi_on_rxy avgGBrxy avgGBx avgGBx_lhcs avgGBy avgGBz avgL avgx_lhcs avgxprime avgxprime_lhcs avgyprime avgptotal avgp_eVperc avgpx_eVperc avgpx_eVperc_lhcs avgpy_eVperc avgpz_eVperc avg_y_on_x cannemixrms cnemixrms dEk_100_eV EnergyDiff maxEk maxEk_eV maxEtotal maxEtotal_eV maxr maxt maxx maxx_lhcs maxxprime maxxprime_lhcs maxy maxyprime minEk minEk_eV minEtotal minEtotal_eV minG mint minx minx_lhcs minxprime minxprime_lhcs miny minyprime nemiprrms nemipxrms stddGonG stdEk stdEk_eV stdEtotal stdEtotal_eV stdGBphi_on_rxy stdGBx stdGBy stdGBz stduncorEk stduniform stdzs transmission ucnemixrms xyaspectratio z_x_slope z_y_slope
 +
 
 +
These additions are not audited, so their use can result in unexpected behavior.
 +
 
 
If any further elements or gdfa progs need to be included, reach out to Max Bruker.
 
If any further elements or gdfa progs need to be included, reach out to Max Bruker.
  
 
=== Running on CUE machines ===
 
=== Running on CUE machines ===
  
Not tested yet.
+
Not actively supported.
  
 
=== Running on farm nodes ===
 
=== Running on farm nodes ===
  
The best way to run GPT is on the farm, which, unlike the common CUE systems, is intended for these sorts of jobs. Running code on the farm does not imply parallel computing, although that is easy to do when needed.
+
The preferred way to run GPT is on the [https://scicomp.jlab.org/scicomp/home farm], which, unlike the common CUE systems, is intended for these sorts of jobs. Running code on the farm does not imply parallel computing, although that is easy to do when needed. The interactive farm nodes can be used for testing and to run short-duration jobs that only need a couple of threads, and their OS environment is the same as on the compute nodes.
  
 
You need an account: [https://jlab.servicenowservices.com/kb?id=kb_article_view&sys_kb_id=df1ff2701bbd4510a888ea4ce54bcb7e Farm and ifarm Access/Accounts]
 
You need an account: [https://jlab.servicenowservices.com/kb?id=kb_article_view&sys_kb_id=df1ff2701bbd4510a888ea4ce54bcb7e Farm and ifarm Access/Accounts]
Line 34: Line 46:
 
SSH to interactive farm nodes is only allowed directly from MFA gateways, see instructions for how to simplify the process: [https://jlab.servicenowservices.com/kb?id=kb_article_view&sys_kb_id=54db27d81b6da550a888ea4ce54bcb76 Connecting to Farm and QCD Interactive Nodes]
 
SSH to interactive farm nodes is only allowed directly from MFA gateways, see instructions for how to simplify the process: [https://jlab.servicenowservices.com/kb?id=kb_article_view&sys_kb_id=54db27d81b6da550a888ea4ce54bcb76 Connecting to Farm and QCD Interactive Nodes]
  
TO DO: how to make GPT run
+
After you SSH into an interactive farm node, the GPT environment variables (PATH, GPTLICENSE) are configured by an [https://jlab.servicenowservices.com/kb?id=kb_article_view&sys_kb_id=d34524581bbdc110a888ea4ce54bcbf7 Environment Module], which is loaded as follows:
 +
module use /scigroup/inj_group/sw/el9/modulefiles
 +
module load gpt/3.55-mpi4
 +
 
 +
==== Interactive job ====
 +
 
 +
On an interactive node (or an allocated interactive session), you can run small GPT jobs like you would on any machine. The interactive nodes have a lot (128?) of CPU cores, so consider limiting the number of concurrent threads to avoid slowing down the machine for other users:
 +
gpt -j 16 ...
 +
or:
 +
mr -j 16 ... gpt -j 1 ...
 +
 
 +
==== Single-node scheduled job ====
 +
 
 +
The size of job that you would consider running interactively (single task, limited number of threads) can also be allocated resources to run on a non-interactive farm node.
 +
Jobs are scheduled with [https://scicomp.jlab.org/docs/farm_slurm_batch slurm], like so:
 +
squeue test.sbatch
 +
In this case, the content of <code>test.sbatch</code> can be something like:
 +
#!/bin/bash
 +
 +
#SBATCH --partition=production
 +
#SBATCH --job-name=gpt_singlenode_test
 +
#SBATCH -t 0:30:0
 +
#SBATCH --ntasks=1
 +
#SBATCH --cpus-per-task=16
 +
#SBATCH --mem-per-cpu=1000M
 +
 +
FOLDER=/group/inj_group/Max/TestFolder
 +
mr -v -o $FOLDER/test.gdf $FOLDER/test.mr gpt -j 16 $FOLDER/test.in
 +
In this example, the <code>mr</code> file only contains static parameters, so it will only run one <code>gpt</code> instance, but this may take a long time because e.g. I'm tracking a million particles.
 +
 
 +
The output file will show up in the specified folder, and the stdout and stderr of the tasks in <code>slurm-xxx.out</code> (xxx being the job id assigned by <code>sbatch</code>). Progress can be monitored with <code>squeue -j xxx</code> or, more conveniently, <code>squeue -u bruker -i 10</code> (lists all jobs I own every 10 seconds).
 +
 
 +
==== MPI job ====
 +
 
 +
To leverage the parallel computing capability of the cluster, one can run GPT in an MPI job. In principle, this does not require much modification to the batch script, other than preceding the final command with <code>mpirun</code>:
 +
#!/bin/bash
 +
 +
#SBATCH --partition=production
 +
#SBATCH --job-name=gpt_mpi_test
 +
#SBATCH -t 0:30:0
 +
#SBATCH -N 64
 +
#SBATCH --mem-per-cpu=1000M
 +
 +
FOLDER=/group/inj_group/Max/TestFolder
 +
mpirun -v mr -v -o $FOLDER/test.gdf $FOLDER/test.mr gpt -j 1 $FOLDER/test.in
 +
This allocates a certain number (N) of CPUs, which may or may not be distributed across the cluster, and runs a GPT process on each. The processes compute independently but are centrally managed. <code>mr</code> takes care of assigning jobs to processes and merging the output. The same thing works for <code>gdfmgo</code> (according to the manual; I have not tested it myself).
 +
For jobs that naturally lend themselves to being parallelized (e.g., <code>mr</code> with a number of independent runs that is large compared to the number of parallel processes), the wall-clock time scales very well, unless merging large output files creates a bottleneck.
 +
Note that one node is allocated to merging/managing only, so if you want the dimension of the parameter space to be divisible by the number of processes, use one fewer.
 +
 
 +
==== Things to note about the farm ====
 +
 
 +
* The environment in which you submit the batch script is carried over to the job, i.e., you only have to load the environment module once per SSH session and the nodes will know where to find GPT.
 +
* GPT jobs do not tend to be I/O heavy, so we can use /group directly, which makes it convenient to get to your files via sftp etc. But for jobs that require storage with a lot of space or bandwidth, the dedicated node-local and cluster-wide storage systems would be better.
  
 
=== JupyterHub ===
 
=== JupyterHub ===
  
 
For those who like Jupyter notebooks, the lab has a JupyterHub server that will spawn a platform-virtualized container for you with a Jupyter instance running inside it.
 
For those who like Jupyter notebooks, the lab has a JupyterHub server that will spawn a platform-virtualized container for you with a Jupyter instance running inside it.
The containers are hosted by the farm but internally run Ubuntu 20.04, not AlmaLinux 9.
+
The containers are hosted by the farm but internally run Ubuntu 24.04, not AlmaLinux 9.
 
To be able to invoke GPT from inside a Jupyter notebook, there are two main options:
 
To be able to invoke GPT from inside a Jupyter notebook, there are two main options:
 
* Run it "locally" in the same instance; for this, we need a Ubuntu version, which is being prepared. TO DO
 
* Run it "locally" in the same instance; for this, we need a Ubuntu version, which is being prepared. TO DO
Line 46: Line 110:
 
=== GPT/Python interface ===
 
=== GPT/Python interface ===
  
TO DO
+
* The GPT environment module adds <code>/scigroup/inj_group/sw/gpt_python</code> to <code>PYTHONPATH</code>.
 +
* Dependencies need to be installed on a per-user basis. Note that pip maintains separate folders for each python version. If packages are needed in Jupyter, you can invoke a terminal in JupyterLab and run pip there.
 +
** [https://pola.rs/ polars]
 +
python3 -m pip install polars
 +
 
 +
TO DO: more description

Latest revision as of 16:45, 18 December 2024

In this context, GPT always refers to General Particle Tracer, never to the language processing AI.

GPT is a 3D particle pusher with (optional) space charge as well as spin tracking. It is easy to parametrize and supports all types of field maps we need for accelerator studies as long as our focus is not on things like wake fields, beam/matter interactions, radiation, etc.

GPT on Windows

We have a pre-packaged GPT version for Windows that includes the Visual Studio dependency for custom elems/progs as well as Alicia's collection of elems and progs. This package, along with the the setup files for manual installation if you prefer, can be found at:

\\jlabwin\SW_Dist\UserApps\GPT

The license key, should you need it, can be found in the license agreement in the Documentation folder.

The most recent GPT version we have pre-packed on Windows is 3.52. An update to 3.55 is planned.

GPT on Linux

For the longest time, we have been using the somewhat dated GPT 3.38 release. Because the transition of farm, CUE, and ACE systems to RHEL 9 / AlmaLinux 9 has led to incompatibilities with GPT 3.38 (mostly compiler/ABI issues when changing the custom elements / gdfa progs configuration) and the /apps folder has been retired, running GPT 3.38 is no longer straightforward. To reduce version chaos, we decided to upgrade to GPT 3.55 and abandon the support for older GPT versions as well as legacy platforms (RHEL < 9 as well as CPUs without AVX2, which includes some old CUE machines).

We are currently testing the following binary distributions (all for x86_64):

  • gpt355-RHEL9-gcc-11.4-4-sep-2024-avx2 --- GPT 3.55, RHEL 9 build w/ GCC 11.4, requires AVX2-capable CPU
  • gpt355-RHEL9-gcc-11.4-4-sep-2024-avx2-MPI4 --- same, but with OpenMPI dependency to run parallel jobs on the cluster

The MPI version will run on a single node without a performance penalty and with no difference in usage. However, because of the library dependencies, it will not run on most machines outside of the farm.

All distributions are deployed with the following additions to the base version:

  • custom elements:
    • IONATOR 08/22/2022: ionator
    • Alicia's elements v3.52_2023_03_17: addxyrot ccsmove indexed_sectormagnet map1D_Brectmag map1D_B_scope map1D_TM_scope map25D_TM_scope map2D_B_scope map3D_B_scope map3D_Ecomplex_scope map3D_Hcomplex_scope map3D_TM_scope map3D_TMparmela map3D_TMparmela_scope map3D_wien mmap3D_B mmap3D_B_scope quad_fringe radialslitmask removedirection ringslit sectormagnet_fix setgdfvars setplateaut setsigma setsupergt TE011gauss_scope TE110gauss_scope TM010gauss_scope TM110gauss_scope TMrectcavity_scope quadrupole_scope wien wienfindB wienfindE writecGBx writeGB writeGBall writeGBallscreen writeGBcyl writeGBcylscreen writeGBscreen writeucGBx writep writepscreen writep_eVperc writep_eVpercscreen
  • gdfa progs:
    • Alicia's progs v3.52_2023_03_17: Q1 Q2 tutacc avgEk avgEk_eV avgEtotal avgEtotal_eV avgGBphi avgGBphi_on_rxy avgGBrxy avgGBx avgGBx_lhcs avgGBy avgGBz avgL avgx_lhcs avgxprime avgxprime_lhcs avgyprime avgptotal avgp_eVperc avgpx_eVperc avgpx_eVperc_lhcs avgpy_eVperc avgpz_eVperc avg_y_on_x cannemixrms cnemixrms dEk_100_eV EnergyDiff maxEk maxEk_eV maxEtotal maxEtotal_eV maxr maxt maxx maxx_lhcs maxxprime maxxprime_lhcs maxy maxyprime minEk minEk_eV minEtotal minEtotal_eV minG mint minx minx_lhcs minxprime minxprime_lhcs miny minyprime nemiprrms nemipxrms stddGonG stdEk stdEk_eV stdEtotal stdEtotal_eV stdGBphi_on_rxy stdGBx stdGBy stdGBz stduncorEk stduniform stdzs transmission ucnemixrms xyaspectratio z_x_slope z_y_slope

These additions are not audited, so their use can result in unexpected behavior.

If any further elements or gdfa progs need to be included, reach out to Max Bruker.

Running on CUE machines

Not actively supported.

Running on farm nodes

The preferred way to run GPT is on the farm, which, unlike the common CUE systems, is intended for these sorts of jobs. Running code on the farm does not imply parallel computing, although that is easy to do when needed. The interactive farm nodes can be used for testing and to run short-duration jobs that only need a couple of threads, and their OS environment is the same as on the compute nodes.

You need an account: Farm and ifarm Access/Accounts

SSH to interactive farm nodes is only allowed directly from MFA gateways, see instructions for how to simplify the process: Connecting to Farm and QCD Interactive Nodes

After you SSH into an interactive farm node, the GPT environment variables (PATH, GPTLICENSE) are configured by an Environment Module, which is loaded as follows:

module use /scigroup/inj_group/sw/el9/modulefiles
module load gpt/3.55-mpi4

Interactive job

On an interactive node (or an allocated interactive session), you can run small GPT jobs like you would on any machine. The interactive nodes have a lot (128?) of CPU cores, so consider limiting the number of concurrent threads to avoid slowing down the machine for other users:

gpt -j 16 ...

or:

mr -j 16 ... gpt -j 1 ...

Single-node scheduled job

The size of job that you would consider running interactively (single task, limited number of threads) can also be allocated resources to run on a non-interactive farm node. Jobs are scheduled with slurm, like so:

squeue test.sbatch

In this case, the content of test.sbatch can be something like:

#!/bin/bash

#SBATCH --partition=production
#SBATCH --job-name=gpt_singlenode_test
#SBATCH -t 0:30:0
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=16
#SBATCH --mem-per-cpu=1000M

FOLDER=/group/inj_group/Max/TestFolder
mr -v -o $FOLDER/test.gdf $FOLDER/test.mr gpt -j 16 $FOLDER/test.in

In this example, the mr file only contains static parameters, so it will only run one gpt instance, but this may take a long time because e.g. I'm tracking a million particles.

The output file will show up in the specified folder, and the stdout and stderr of the tasks in slurm-xxx.out (xxx being the job id assigned by sbatch). Progress can be monitored with squeue -j xxx or, more conveniently, squeue -u bruker -i 10 (lists all jobs I own every 10 seconds).

MPI job

To leverage the parallel computing capability of the cluster, one can run GPT in an MPI job. In principle, this does not require much modification to the batch script, other than preceding the final command with mpirun:

#!/bin/bash

#SBATCH --partition=production
#SBATCH --job-name=gpt_mpi_test
#SBATCH -t 0:30:0
#SBATCH -N 64
#SBATCH --mem-per-cpu=1000M

FOLDER=/group/inj_group/Max/TestFolder
mpirun -v mr -v -o $FOLDER/test.gdf $FOLDER/test.mr gpt -j 1 $FOLDER/test.in

This allocates a certain number (N) of CPUs, which may or may not be distributed across the cluster, and runs a GPT process on each. The processes compute independently but are centrally managed. mr takes care of assigning jobs to processes and merging the output. The same thing works for gdfmgo (according to the manual; I have not tested it myself). For jobs that naturally lend themselves to being parallelized (e.g., mr with a number of independent runs that is large compared to the number of parallel processes), the wall-clock time scales very well, unless merging large output files creates a bottleneck. Note that one node is allocated to merging/managing only, so if you want the dimension of the parameter space to be divisible by the number of processes, use one fewer.

Things to note about the farm

  • The environment in which you submit the batch script is carried over to the job, i.e., you only have to load the environment module once per SSH session and the nodes will know where to find GPT.
  • GPT jobs do not tend to be I/O heavy, so we can use /group directly, which makes it convenient to get to your files via sftp etc. But for jobs that require storage with a lot of space or bandwidth, the dedicated node-local and cluster-wide storage systems would be better.

JupyterHub

For those who like Jupyter notebooks, the lab has a JupyterHub server that will spawn a platform-virtualized container for you with a Jupyter instance running inside it. The containers are hosted by the farm but internally run Ubuntu 24.04, not AlmaLinux 9. To be able to invoke GPT from inside a Jupyter notebook, there are two main options:

  • Run it "locally" in the same instance; for this, we need a Ubuntu version, which is being prepared. TO DO
  • Submit it as another farm job (many more threads + RAM available); this is slightly less convenient because pipes are not available for data exchange and one has to deal with slurm for managing the extra job. But it should work. If there's time, I'll look into how this could be added transparently to the GPT/Python interface.

GPT/Python interface

  • The GPT environment module adds /scigroup/inj_group/sw/gpt_python to PYTHONPATH.
  • Dependencies need to be installed on a per-user basis. Note that pip maintains separate folders for each python version. If packages are needed in Jupyter, you can invoke a terminal in JupyterLab and run pip there.
python3 -m pip install polars

TO DO: more description