Difference between revisions of "EPSCI Group Meeting Aug. 17, 2020"

From epsciwiki
Jump to navigation Jump to search
 
(3 intermediate revisions by the same user not shown)
Line 39: Line 39:
 
#* [[Fortnight Papers]] -> Moved to alternating Mondays (opposite SRO)
 
#* [[Fortnight Papers]] -> Moved to alternating Mondays (opposite SRO)
 
#** Delayed yet another 2 weeks due to NUG meeting
 
#** Delayed yet another 2 weeks due to NUG meeting
#* [https://www.nersc.gov/users/NUG/annual-meetings/nug-2020/ NERSC User's Group Meeting 8/17/2020 @ 11am-6pm]
+
#* [https://www.nersc.gov/users/NUG/annual-meetings/nug-2020/ NERSC User's Group Meeting 8/17/2020 @ 12am-7pm]
 
# Graham's Project
 
# Graham's Project
 
# Ongoing Activities
 
# Ongoing Activities
Line 54: Line 54:
 
#* EVIO-6
 
#* EVIO-6
 
#* Offsite Computing
 
#* Offsite Computing
#** [https://halldweb.jlab.org/data_monitoring/recon/summary_swif2_output_recon_2019-11_ver01_batch01a/index.html NERSC], PSC
+
#** NERSC ([https://halldweb.jlab.org/data_monitoring/recon/summary_swif2_output_recon_2019-11_ver01_batch01/index.html flex],[https://halldweb.jlab.org/data_monitoring/recon/summary_swif2_output_recon_2019-11_ver01_batch01b/index.html regular]), PSC
 
#** OSG
 
#** OSG
 
# GUI for Calorimeter calibration scripts (Hall-D Request)
 
# GUI for Calorimeter calibration scripts (Hall-D Request)
Line 64: Line 64:
 
<hr>
 
<hr>
 
=== Minutes: ===
 
=== Minutes: ===
<?- Attendees: David L.(chair), Carl T., Nathan B., Thomas B., Vardan G., Kishan R., Graham H. -->
+
Attendees: David L.(chair), Carl T., Nathan B., Thomas B., Vardan G., Kishan R., Graham H.
 +
 
 +
'''Announcements'''
 +
* Fortnight papers delayed 2 weeks due to NUG meeting today
 +
* Nathan suggested adding internal code reviews to fortnight meetings
 +
** Someone would provide piece of code and others would look over and comment
 +
** Not intended to cause re-writes of code, but to share knowledge/experience/techniques
 +
* Vardan suggested we consider adopting coding standards within group
 +
** General consensus this would be a good idea, but should not be too restrictive
 +
** Vardan will look into Java standards
 +
** Nathan will look into C++ standards
 +
** Kishan will look into Python standards
 +
 
 +
'''CODA'''
 +
* Hall-A DAQ system experienced issues with crashes last week that looked related to AFECS (Vardan's code)
 +
* Turned out to be due to incompatibility with certain minor revisions of JDK8 and some library dependencies
 +
* Hall-A had recently updated their JDK which is why the problem suddenly emerged.
 +
* Vardan identified the problem and advised them to change to JDK8 outside of the problematic minor revision range. They did so and are now running again.
 +
 
 +
'''CLARA'''
 +
* Vardan received word of some (rare) crashes of Hall-B data production jobs
 +
* Problem related to uncaught exception in one of the engines used in reconstruction
 +
* Should be fixed in engine itself, but unclear why catch-all mechanism in orchestrator was not working. Vardan will look into it.
 +
* Nathan Baltzell provided some information that may help pinpoint the exact problem, but this is happening rarely enough that it is not a show-stopper.
 +
 
 +
'''SRO'''
 +
* Vardan and Chris L. have come up with some specs for the software stream source Chris is working on.
 +
* Vardan looking at 2 other Data Lake packages (besides redis)
 +
** Has one more configuration option for redis that he will try this week.
 +
 
 +
'''JANA2'''
 +
* Nathan is working to look into some bug reports David made
 +
* Some discussion on profilers for C++
 +
** perf is useful on Linux, but not clear how to get the most user-friendly results
 +
** CLion has built in tool that can produce graphs (uses perf underneath)
 +
** VTune license available in PCSCI group, but not easily accessible.
 +
*** NERSC also has VTune license which could be used in interactive session. David will look into it
 +
 
 +
'''A.I.'''
 +
* Kishan able to run model on GPU's on sciml190X computers (despite David's out-of-date documentation)
 +
* Investigating using embedded python interpreter in JANA
 +
** GIL (=Global Interpreter Lock) prevents efficient multi-threading this way
 +
** A package pypy looks like it may provide better multi-threaded support. Kishan will look into it this week.
 +
 
 +
'''EVIO-6'''
 +
* Final code cleanup underway.
 +
* Carl will send detailed e-mail to Nathan this week so he can have a look and comment
 +
 
 +
'''Offsite Production'''
 +
* NERSC production is still quite slow. Test using "regular" queue did not show dramatic increase in throughput.
 +
* Successful PSC test run over the week. Will use to process as much as 25% of GlueX Spring 2020 data set.
 +
* Some discussion on making JLab job submission system seamlessly run jobs onsite/offsite.
 +
** Will require changing how allocation requests are done.
 +
** Project is 1 year out.

Latest revision as of 15:18, 17 August 2020

The meeting time is 10:00am.

Connection Info:

You can connect using BlueJeans Video conferencing (ID: 253 300 597). (Click "Expand" to the right for details -->):

Meeting URL
 https://bluejeans.com/253300597?src=join_info

Meeting ID
253 300 597

Want to dial in from a phone?

Dial one of the following numbers:
+1.888.240.2560 (US Toll Free)
(see all numbers - https://www.bluejeans.com/premium-numbers)

Enter the meeting ID and passcode followed by #

Connecting from a room system?
Dial: bjn.vc or 199.48.152.152 and enter your meeting ID & passcode

Agenda:

  1. Previous meeting
  2. Announcements
  3. Graham's Project
  4. Ongoing Activities
  5. GUI for Calorimeter calibration scripts (Hall-D Request)
  6. Publications
  7. AOT



Minutes:

Attendees: David L.(chair), Carl T., Nathan B., Thomas B., Vardan G., Kishan R., Graham H.

Announcements

  • Fortnight papers delayed 2 weeks due to NUG meeting today
  • Nathan suggested adding internal code reviews to fortnight meetings
    • Someone would provide piece of code and others would look over and comment
    • Not intended to cause re-writes of code, but to share knowledge/experience/techniques
  • Vardan suggested we consider adopting coding standards within group
    • General consensus this would be a good idea, but should not be too restrictive
    • Vardan will look into Java standards
    • Nathan will look into C++ standards
    • Kishan will look into Python standards

CODA

  • Hall-A DAQ system experienced issues with crashes last week that looked related to AFECS (Vardan's code)
  • Turned out to be due to incompatibility with certain minor revisions of JDK8 and some library dependencies
  • Hall-A had recently updated their JDK which is why the problem suddenly emerged.
  • Vardan identified the problem and advised them to change to JDK8 outside of the problematic minor revision range. They did so and are now running again.

CLARA

  • Vardan received word of some (rare) crashes of Hall-B data production jobs
  • Problem related to uncaught exception in one of the engines used in reconstruction
  • Should be fixed in engine itself, but unclear why catch-all mechanism in orchestrator was not working. Vardan will look into it.
  • Nathan Baltzell provided some information that may help pinpoint the exact problem, but this is happening rarely enough that it is not a show-stopper.

SRO

  • Vardan and Chris L. have come up with some specs for the software stream source Chris is working on.
  • Vardan looking at 2 other Data Lake packages (besides redis)
    • Has one more configuration option for redis that he will try this week.

JANA2

  • Nathan is working to look into some bug reports David made
  • Some discussion on profilers for C++
    • perf is useful on Linux, but not clear how to get the most user-friendly results
    • CLion has built in tool that can produce graphs (uses perf underneath)
    • VTune license available in PCSCI group, but not easily accessible.
      • NERSC also has VTune license which could be used in interactive session. David will look into it

A.I.

  • Kishan able to run model on GPU's on sciml190X computers (despite David's out-of-date documentation)
  • Investigating using embedded python interpreter in JANA
    • GIL (=Global Interpreter Lock) prevents efficient multi-threading this way
    • A package pypy looks like it may provide better multi-threaded support. Kishan will look into it this week.

EVIO-6

  • Final code cleanup underway.
  • Carl will send detailed e-mail to Nathan this week so he can have a look and comment

Offsite Production

  • NERSC production is still quite slow. Test using "regular" queue did not show dramatic increase in throughput.
  • Successful PSC test run over the week. Will use to process as much as 25% of GlueX Spring 2020 data set.
  • Some discussion on making JLab job submission system seamlessly run jobs onsite/offsite.
    • Will require changing how allocation requests are done.
    • Project is 1 year out.