Difference between revisions of "EPSCI Group Meeting Aug. 17, 2020"

From epsciwiki
Jump to navigation Jump to search
(Created page with " The meeting time is 10:00am. === Connection Info: === <div class="toccolours mw-collapsible mw-collapsed"> You can connect using [https://bluejeans.com/253300597 BlueJeans V...")
 
 
(4 intermediate revisions by the same user not shown)
Line 32: Line 32:
 
# [[EPSCI Group Meeting Aug. 10, 2020 | Previous meeting]]
 
# [[EPSCI Group Meeting Aug. 10, 2020 | Previous meeting]]
 
# Announcements
 
# Announcements
#* Beam delivery for physics ongoing (scheduled to end around Sep. 8)  
+
#* Beam delivery for physics ongoing (scheduled to end Sep. 21st)  
 
#** David on shift 8/18, 8/19, 8/28, 8/29, 9/11, 9/12
 
#** David on shift 8/18, 8/19, 8/28, 8/29, 9/11, 9/12
 
#** Thomas on shift 8/20, 8/21, 8/30, 8/31, 9/9, 9/10
 
#** Thomas on shift 8/20, 8/21, 8/30, 8/31, 9/9, 9/10
#* Streaming Data Scientist job posted!
+
#* [https://careers.peopleclick.com/careerscp/client_jeffersonlab/external/jobDetails.do?functionName=getJobDetail&jobPostId=1665&localeCode=en-us Streaming Data Scientist job posted!]
 
#* [https://www.jlab.org/memo/information-help-navigate-back-school-planning Back to school etc. memo]
 
#* [https://www.jlab.org/memo/information-help-navigate-back-school-planning Back to school etc. memo]
 
#* [[Fortnight Papers]] -> Moved to alternating Mondays (opposite SRO)
 
#* [[Fortnight Papers]] -> Moved to alternating Mondays (opposite SRO)
#* [https://www.nersc.gov/users/NUG/annual-meetings/nug-2020/ NERSC User's Group Meeting 8/17/2020 @ 11am-6pm]
+
#** Delayed yet another 2 weeks due to NUG meeting
 +
#* [https://www.nersc.gov/users/NUG/annual-meetings/nug-2020/ NERSC User's Group Meeting 8/17/2020 @ 12am-7pm]
 
# Graham's Project
 
# Graham's Project
 
# Ongoing Activities
 
# Ongoing Activities
 +
#* JLab Common Environment (CE) + SPACK
 +
#* SRO
 +
#** ERSAP
 +
#** Hall-B/D TriDAS
 
#* JANA2
 
#* JANA2
 
#** GlueX port
 
#** GlueX port
 
#** A.I. support
 
#** A.I. support
 
#* A.I.
 
#* A.I.
#** ENP + CST Meeting week of 8/24 [https://docs.google.com/spreadsheets/d/1bAsHq4Zp4pTUMqwUn3600jT3fUrFGIhN5EdCJqn2fr8/edit?usp=sharing projects list]
+
#** ENP + CST Town Hall Meeting Friday 8/28 [https://docs.google.com/spreadsheets/d/1bAsHq4Zp4pTUMqwUn3600jT3fUrFGIhN5EdCJqn2fr8/edit?usp=sharing projects list]
 
#** [https://indico.gsi.de/event/10576/ GlueX-EIC-PANDA ML workshop Sep. 21-25]
 
#** [https://indico.gsi.de/event/10576/ GlueX-EIC-PANDA ML workshop Sep. 21-25]
 
#* EVIO-6
 
#* EVIO-6
#* JLab Common Environment (CE) + SPACK
 
#* SRO
 
#** EIC SRO Meeting today @14:00
 
#** ERSAP
 
#** Hall-B/D TriDAS
 
 
#* Offsite Computing
 
#* Offsite Computing
#** [https://halldweb.jlab.org/data_monitoring/recon/summary_swif2_output_recon_2019-11_ver01_batch01/index.html NERSC], PSC
+
#** NERSC ([https://halldweb.jlab.org/data_monitoring/recon/summary_swif2_output_recon_2019-11_ver01_batch01/index.html flex],[https://halldweb.jlab.org/data_monitoring/recon/summary_swif2_output_recon_2019-11_ver01_batch01b/index.html regular]), PSC
 
#** OSG
 
#** OSG
 
# GUI for Calorimeter calibration scripts (Hall-D Request)
 
# GUI for Calorimeter calibration scripts (Hall-D Request)
Line 64: Line 64:
 
<hr>
 
<hr>
 
=== Minutes: ===
 
=== Minutes: ===
<?- Attendees: David L.(chair), Carl T., Nathan B., Thomas B., Vardan G., Kishan R., Graham H. -->
+
Attendees: David L.(chair), Carl T., Nathan B., Thomas B., Vardan G., Kishan R., Graham H.
 +
 
 +
'''Announcements'''
 +
* Fortnight papers delayed 2 weeks due to NUG meeting today
 +
* Nathan suggested adding internal code reviews to fortnight meetings
 +
** Someone would provide piece of code and others would look over and comment
 +
** Not intended to cause re-writes of code, but to share knowledge/experience/techniques
 +
* Vardan suggested we consider adopting coding standards within group
 +
** General consensus this would be a good idea, but should not be too restrictive
 +
** Vardan will look into Java standards
 +
** Nathan will look into C++ standards
 +
** Kishan will look into Python standards
 +
 
 +
'''CODA'''
 +
* Hall-A DAQ system experienced issues with crashes last week that looked related to AFECS (Vardan's code)
 +
* Turned out to be due to incompatibility with certain minor revisions of JDK8 and some library dependencies
 +
* Hall-A had recently updated their JDK which is why the problem suddenly emerged.
 +
* Vardan identified the problem and advised them to change to JDK8 outside of the problematic minor revision range. They did so and are now running again.
 +
 
 +
'''CLARA'''
 +
* Vardan received word of some (rare) crashes of Hall-B data production jobs
 +
* Problem related to uncaught exception in one of the engines used in reconstruction
 +
* Should be fixed in engine itself, but unclear why catch-all mechanism in orchestrator was not working. Vardan will look into it.
 +
* Nathan Baltzell provided some information that may help pinpoint the exact problem, but this is happening rarely enough that it is not a show-stopper.
 +
 
 +
'''SRO'''
 +
* Vardan and Chris L. have come up with some specs for the software stream source Chris is working on.
 +
* Vardan looking at 2 other Data Lake packages (besides redis)
 +
** Has one more configuration option for redis that he will try this week.
 +
 
 +
'''JANA2'''
 +
* Nathan is working to look into some bug reports David made
 +
* Some discussion on profilers for C++
 +
** perf is useful on Linux, but not clear how to get the most user-friendly results
 +
** CLion has built in tool that can produce graphs (uses perf underneath)
 +
** VTune license available in PCSCI group, but not easily accessible.
 +
*** NERSC also has VTune license which could be used in interactive session. David will look into it
 +
 
 +
'''A.I.'''
 +
* Kishan able to run model on GPU's on sciml190X computers (despite David's out-of-date documentation)
 +
* Investigating using embedded python interpreter in JANA
 +
** GIL (=Global Interpreter Lock) prevents efficient multi-threading this way
 +
** A package pypy looks like it may provide better multi-threaded support. Kishan will look into it this week.
 +
 
 +
'''EVIO-6'''
 +
* Final code cleanup underway.
 +
* Carl will send detailed e-mail to Nathan this week so he can have a look and comment
 +
 
 +
'''Offsite Production'''
 +
* NERSC production is still quite slow. Test using "regular" queue did not show dramatic increase in throughput.
 +
* Successful PSC test run over the week. Will use to process as much as 25% of GlueX Spring 2020 data set.
 +
* Some discussion on making JLab job submission system seamlessly run jobs onsite/offsite.
 +
** Will require changing how allocation requests are done.
 +
** Project is 1 year out.

Latest revision as of 15:18, 17 August 2020

The meeting time is 10:00am.

Connection Info:

You can connect using BlueJeans Video conferencing (ID: 253 300 597). (Click "Expand" to the right for details -->):

Meeting URL
 https://bluejeans.com/253300597?src=join_info

Meeting ID
253 300 597

Want to dial in from a phone?

Dial one of the following numbers:
+1.888.240.2560 (US Toll Free)
(see all numbers - https://www.bluejeans.com/premium-numbers)

Enter the meeting ID and passcode followed by #

Connecting from a room system?
Dial: bjn.vc or 199.48.152.152 and enter your meeting ID & passcode

Agenda:

  1. Previous meeting
  2. Announcements
  3. Graham's Project
  4. Ongoing Activities
  5. GUI for Calorimeter calibration scripts (Hall-D Request)
  6. Publications
  7. AOT



Minutes:

Attendees: David L.(chair), Carl T., Nathan B., Thomas B., Vardan G., Kishan R., Graham H.

Announcements

  • Fortnight papers delayed 2 weeks due to NUG meeting today
  • Nathan suggested adding internal code reviews to fortnight meetings
    • Someone would provide piece of code and others would look over and comment
    • Not intended to cause re-writes of code, but to share knowledge/experience/techniques
  • Vardan suggested we consider adopting coding standards within group
    • General consensus this would be a good idea, but should not be too restrictive
    • Vardan will look into Java standards
    • Nathan will look into C++ standards
    • Kishan will look into Python standards

CODA

  • Hall-A DAQ system experienced issues with crashes last week that looked related to AFECS (Vardan's code)
  • Turned out to be due to incompatibility with certain minor revisions of JDK8 and some library dependencies
  • Hall-A had recently updated their JDK which is why the problem suddenly emerged.
  • Vardan identified the problem and advised them to change to JDK8 outside of the problematic minor revision range. They did so and are now running again.

CLARA

  • Vardan received word of some (rare) crashes of Hall-B data production jobs
  • Problem related to uncaught exception in one of the engines used in reconstruction
  • Should be fixed in engine itself, but unclear why catch-all mechanism in orchestrator was not working. Vardan will look into it.
  • Nathan Baltzell provided some information that may help pinpoint the exact problem, but this is happening rarely enough that it is not a show-stopper.

SRO

  • Vardan and Chris L. have come up with some specs for the software stream source Chris is working on.
  • Vardan looking at 2 other Data Lake packages (besides redis)
    • Has one more configuration option for redis that he will try this week.

JANA2

  • Nathan is working to look into some bug reports David made
  • Some discussion on profilers for C++
    • perf is useful on Linux, but not clear how to get the most user-friendly results
    • CLion has built in tool that can produce graphs (uses perf underneath)
    • VTune license available in PCSCI group, but not easily accessible.
      • NERSC also has VTune license which could be used in interactive session. David will look into it

A.I.

  • Kishan able to run model on GPU's on sciml190X computers (despite David's out-of-date documentation)
  • Investigating using embedded python interpreter in JANA
    • GIL (=Global Interpreter Lock) prevents efficient multi-threading this way
    • A package pypy looks like it may provide better multi-threaded support. Kishan will look into it this week.

EVIO-6

  • Final code cleanup underway.
  • Carl will send detailed e-mail to Nathan this week so he can have a look and comment

Offsite Production

  • NERSC production is still quite slow. Test using "regular" queue did not show dramatic increase in throughput.
  • Successful PSC test run over the week. Will use to process as much as 25% of GlueX Spring 2020 data set.
  • Some discussion on making JLab job submission system seamlessly run jobs onsite/offsite.
    • Will require changing how allocation requests are done.
    • Project is 1 year out.