EPSCI Group Meeting Aug. 17, 2020

From epsciwiki
Jump to navigation Jump to search

The meeting time is 10:00am.

Connection Info:

You can connect using BlueJeans Video conferencing (ID: 253 300 597). (Click "Expand" to the right for details -->):

Meeting URL
 https://bluejeans.com/253300597?src=join_info

Meeting ID
253 300 597

Want to dial in from a phone?

Dial one of the following numbers:
+1.888.240.2560 (US Toll Free)
(see all numbers - https://www.bluejeans.com/premium-numbers)

Enter the meeting ID and passcode followed by #

Connecting from a room system?
Dial: bjn.vc or 199.48.152.152 and enter your meeting ID & passcode

Agenda:

  1. Previous meeting
  2. Announcements
  3. Graham's Project
  4. Ongoing Activities
  5. GUI for Calorimeter calibration scripts (Hall-D Request)
  6. Publications
  7. AOT



Minutes:

Attendees: David L.(chair), Carl T., Nathan B., Thomas B., Vardan G., Kishan R., Graham H.

Announcements

  • Fortnight papers delayed 2 weeks due to NUG meeting today
  • Nathan suggested adding internal code reviews to fortnight meetings
    • Someone would provide piece of code and others would look over and comment
    • Not intended to cause re-writes of code, but to share knowledge/experience/techniques
  • Vardan suggested we consider adopting coding standards within group
    • General consensus this would be a good idea, but should not be too restrictive
    • Vardan will look into Java standards
    • Nathan will look into C++ standards
    • Kishan will look into Python standards

CODA

  • Hall-A DAQ system experienced issues with crashes last week that looked related to AFECS (Vardan's code)
  • Turned out to be due to incompatibility with certain minor revisions of JDK8 and some library dependencies
  • Hall-A had recently updated their JDK which is why the problem suddenly emerged.
  • Vardan identified the problem and advised them to change to JDK8 outside of the problematic minor revision range. They did so and are now running again.

CLARA

  • Vardan received word of some (rare) crashes of Hall-B data production jobs
  • Problem related to uncaught exception in one of the engines used in reconstruction
  • Should be fixed in engine itself, but unclear why catch-all mechanism in orchestrator was not working. Vardan will look into it.
  • Nathan Baltzell provided some information that may help pinpoint the exact problem, but this is happening rarely enough that it is not a show-stopper.

SRO

  • Vardan and Chris L. have come up with some specs for the software stream source Chris is working on.
  • Vardan looking at 2 other Data Lake packages (besides redis)
    • Has one more configuration option for redis that he will try this week.

JANA2

  • Nathan is working to look into some bug reports David made
  • Some discussion on profilers for C++
    • perf is useful on Linux, but not clear how to get the most user-friendly results
    • CLion has built in tool that can produce graphs (uses perf underneath)
    • VTune license available in PCSCI group, but not easily accessible.
      • NERSC also has VTune license which could be used in interactive session. David will look into it

A.I.

  • Kishan able to run model on GPU's on sciml190X computers (despite David's out-of-date documentation)
  • Investigating using embedded python interpreter in JANA
    • GIL (=Global Interpreter Lock) prevents efficient multi-threading this way
    • A package pypy looks like it may provide better multi-threaded support. Kishan will look into it this week.

EVIO-6

  • Final code cleanup underway.
  • Carl will send detailed e-mail to Nathan this week so he can have a look and comment

Offsite Production

  • NERSC production is still quite slow. Test using "regular" queue did not show dramatic increase in throughput.
  • Successful PSC test run over the week. Will use to process as much as 25% of GlueX Spring 2020 data set.
  • Some discussion on making JLab job submission system seamlessly run jobs onsite/offsite.
    • Will require changing how allocation requests are done.
    • Project is 1 year out.