Difference between revisions of "Online Recon - Expert"

From hps run
Jump to navigation Jump to search
 
(18 intermediate revisions by 2 users not shown)
Line 1: Line 1:
  
 
== Servers ==
 
== Servers ==
The core HPS java software builds for online, multi-threaded reconstruction live in <tt>~hpsrun/online_recon</tt> and were installed using the [https://confluence.slac.stanford.edu/display/hpsg/Online+Reconstruction+Tools instructions on confluence].  We may decide the hps-java installation there should be a symlink to a standard one elsewhere, but note it requires an extra installation step and (currently) a non-master git branch.  If the tomcat servlet needs updating, it will need to be rebuilt and redpolyed.
 
  
Everything operates as user=hpsrun and startup originates in <tt>$EPICS/apps/iocBoot/procServ.conf</tt>.  There are 4 components, started sequentially, named:
+
Currently everything in this section is running on <tt>clonfarm11</tt>, but that may change.
 +
 
 +
==== Software Installations ====
 +
 
 +
The core HPS java software builds for online, multi-threaded reconstruction live in <tt>~hpsrun/online_recon</tt> and were installed using the [https://confluence.slac.stanford.edu/display/hpsg/Online+Reconstruction+Tools instructions on confluence]. 
 +
 
 +
We may decide the hps-java installation there should be a symlink to a standard one elsewhere, but note it requires an extra installation step and (currently) a non-master git branch.
 +
 
 +
If the tomcat servlet needs updating, it will need to be rebuilt and redpolyed.
 +
 
 +
==== Operations ====
 +
 
 +
Everything operates as <tt>user=hpsrun</tt> and startup originates in <tt>$EPICS/apps/iocBoot/procServ.conf</tt>.  There are 4 components, started sequentially, named:
  
 
# <tt>dqm_et</tt>
 
# <tt>dqm_et</tt>
Line 14: Line 25:
 
#* instructs the server on parameters, e.g. #threads, steering file
 
#* instructs the server on parameters, e.g. #threads, steering file
  
Each component is run automatically (currently on clonfarm2) and is accessible in a telnet session by running
+
Each component is spawned automatically in procServ and is accessible via a telnet session by running
 
* <tt>softioc_console NAME</tt>
 
* <tt>softioc_console NAME</tt>
 
Once connected, individual components can be interactivly killed, paused, restarted, using the <tt>ctrl-X/T</tt> sequences (which is printed when you connect).  The <tt>dqm_client</tt> process can also be used interactively within telnet.  To disconnect from any telnet session, <tt>ctrl-]</tt> and then type <tt>quit</tt>.
 
Once connected, individual components can be interactivly killed, paused, restarted, using the <tt>ctrl-X/T</tt> sequences (which is printed when you connect).  The <tt>dqm_client</tt> process can also be used interactively within telnet.  To disconnect from any telnet session, <tt>ctrl-]</tt> and then type <tt>quit</tt>.
  
Configuration files for all are currently in their startup directory <tt>$EPICS/apps/iocBoot/dqm</tt> and require
+
Configuration files for all are currently in their startup directory <tt>$EPICS/apps/iocBoot/dqm</tt>.
  
A full teardown and restart of all components, including the tomcat server, is in user=hpsrun's $PATH:
+
A full teardown and restart of all components, including the tomcat server, is in <tt>user=hpsrun</tt>'s <tt>$PATH</tt>:
  
 
* <tt> hps-dqm-restart.sh</tt>
 
* <tt> hps-dqm-restart.sh</tt>
 
==== TODO ====
 
 
* Can the webapp be modified to automatically reload the plots periodically, without the user manually refreshing the web page?  This will be important to give users feedback that things are working (and no one wants to have to reload the page).  1 Hz would be great, every few seconds would be good enough.
 
* The stop/start functionality in the server appears maybe unreliable, and the tomcat servlet also appears to need to be restarted if the recon server is restarted.  These things will need to be done in order to zero the histograms, which will be every run and every time beam is away for a significant amount of time.  Meanwhile, a script to do a full teardown and restart is available, but it takes many seconds to ensure things are started with appropriate, intermediate delays.
 
  
 
== Clients ==
 
== Clients ==
 
Both should work from any clon machine:
 
Both should work from any clon machine:
 +
* web browser @ <tt>http://clonfarm2.jlab.org:8080/HPSRecon/</tt>
 +
** this does not update automatically (currently) but requires a manual refresh
 +
** but clicking on a detector displays many plots simultaneously
 
* jas3-->Tools-->Connect [[Image:jas3-onlinerecon.png|200px]]
 
* jas3-->Tools-->Connect [[Image:jas3-onlinerecon.png|200px]]
* web browser @ <tt>http://clonfarm2.jlab.org:8080/HPSRecon/</tt>
+
** this has the advantage that it refreshes automatically
 +
 
 +
== Miscellaneous ==
 +
 
 +
==== To Do ====
 +
 
 +
* The <tt>dqm_server</tt> configuration will need updating for the online ET ring.
 +
* The <tt>dqm_client</tt> configuration will need updating for different steering files and run numbers.
 +
* The stop/start functionality in the server appears maybe unreliable, and the tomcat servlet also appears to need to be restarted if the recon server is restarted.  These things will need to be done in order to zero the histograms, which will be every run and every time beam is away for a significant amount of time.
 +
** Meanwhile, a script to do a full teardown and restart is available and may well be sufficient.
 +
 
 +
==== Setup ====
 +
 
 +
To setup on a new clon machine, this may be necessary:
 +
* <tt>yum install tomcat tomcat-admin-webapps ksh telnet</tt>
 +
* edit <tt>/etc/tomcat/tomcat-users.xml</tt> and enable the "manager-gui" role and "admin" user
 +
 
 +
To enable <tt>user=hpsrun</tt> to restart the tomcat server in a batch script (which happens in the aformentioned restart script):
 +
* <tt>visudo</tt> on the machine running the tomcat server and add these lines:
 +
  %onliners ALL=NOPASSWD:/usr/bin/systemctl stop tomcat
 +
  %onliners ALL=NOPASSWD:/usr/bin/systemctl start tomcat
 +
  %onliners ALL=NOPASSWD:/usr/bin/systemctl restart tomcat
 +
 
 +
<strike>Tomcat is somehow unhappy on <tt>clonfarm3</tt>, appears to be a permissions issue.  Tried various things but haven't tracked it down.</strike>
 +
 
 +
On some machines, tomcat installations suffer from some runtime directory permissions issue.  It appears possibly related with JLab's LDAP changes in previous year, where groups of the same name as a user got appended with <tt>-grp</tt>. The current workaround is to modify the tomcat service to run as root in <tt>/usr/lib/systemd/system/</tt>.

Latest revision as of 09:04, 26 August 2021

Servers

Currently everything in this section is running on clonfarm11, but that may change.

Software Installations

The core HPS java software builds for online, multi-threaded reconstruction live in ~hpsrun/online_recon and were installed using the instructions on confluence.

We may decide the hps-java installation there should be a symlink to a standard one elsewhere, but note it requires an extra installation step and (currently) a non-master git branch.

If the tomcat servlet needs updating, it will need to be rebuilt and redpolyed.

Operations

Everything operates as user=hpsrun and startup originates in $EPICS/apps/iocBoot/procServ.conf. There are 4 components, started sequentially, named:

  1. dqm_et
    • just an ET ring, offline only
  2. dqm_evio2et
    • pipes an EVIO file to the ET ring, offline only
  3. dqm_server
    • receives instructions from the client below
  4. dqm_client
    • instructs the server on parameters, e.g. #threads, steering file

Each component is spawned automatically in procServ and is accessible via a telnet session by running

  • softioc_console NAME

Once connected, individual components can be interactivly killed, paused, restarted, using the ctrl-X/T sequences (which is printed when you connect). The dqm_client process can also be used interactively within telnet. To disconnect from any telnet session, ctrl-] and then type quit.

Configuration files for all are currently in their startup directory $EPICS/apps/iocBoot/dqm.

A full teardown and restart of all components, including the tomcat server, is in user=hpsrun's $PATH:

  • hps-dqm-restart.sh

Clients

Both should work from any clon machine:

  • web browser @ http://clonfarm2.jlab.org:8080/HPSRecon/
    • this does not update automatically (currently) but requires a manual refresh
    • but clicking on a detector displays many plots simultaneously
  • jas3-->Tools-->Connect Jas3-onlinerecon.png
    • this has the advantage that it refreshes automatically

Miscellaneous

To Do

  • The dqm_server configuration will need updating for the online ET ring.
  • The dqm_client configuration will need updating for different steering files and run numbers.
  • The stop/start functionality in the server appears maybe unreliable, and the tomcat servlet also appears to need to be restarted if the recon server is restarted. These things will need to be done in order to zero the histograms, which will be every run and every time beam is away for a significant amount of time.
    • Meanwhile, a script to do a full teardown and restart is available and may well be sufficient.

Setup

To setup on a new clon machine, this may be necessary:

  • yum install tomcat tomcat-admin-webapps ksh telnet
  • edit /etc/tomcat/tomcat-users.xml and enable the "manager-gui" role and "admin" user

To enable user=hpsrun to restart the tomcat server in a batch script (which happens in the aformentioned restart script):

  • visudo on the machine running the tomcat server and add these lines:
 %onliners ALL=NOPASSWD:/usr/bin/systemctl stop tomcat
 %onliners ALL=NOPASSWD:/usr/bin/systemctl start tomcat
 %onliners ALL=NOPASSWD:/usr/bin/systemctl restart tomcat

Tomcat is somehow unhappy on clonfarm3, appears to be a permissions issue. Tried various things but haven't tracked it down.

On some machines, tomcat installations suffer from some runtime directory permissions issue. It appears possibly related with JLab's LDAP changes in previous year, where groups of the same name as a user got appended with -grp. The current workaround is to modify the tomcat service to run as root in /usr/lib/systemd/system/.