EPSCI Group Meeting Aug. 17, 2020
Jump to navigation
Jump to search
The meeting time is 10:00am.
Connection Info:
You can connect using BlueJeans Video conferencing (ID: 253 300 597). (Click "Expand" to the right for details -->):
Meeting URL https://bluejeans.com/253300597?src=join_info Meeting ID 253 300 597 Want to dial in from a phone? Dial one of the following numbers: +1.888.240.2560 (US Toll Free) (see all numbers - https://www.bluejeans.com/premium-numbers) Enter the meeting ID and passcode followed by # Connecting from a room system? Dial: bjn.vc or 199.48.152.152 and enter your meeting ID & passcode
Agenda:
- Previous meeting
- Announcements
- Beam delivery for physics ongoing (scheduled to end Sep. 21st)
- David on shift 8/18, 8/19, 8/28, 8/29, 9/11, 9/12
- Thomas on shift 8/20, 8/21, 8/30, 8/31, 9/9, 9/10
- Streaming Data Scientist job posted!
- Back to school etc. memo
- Fortnight Papers -> Moved to alternating Mondays (opposite SRO)
- Delayed yet another 2 weeks due to NUG meeting
- NERSC User's Group Meeting 8/17/2020 @ 12am-7pm
- Beam delivery for physics ongoing (scheduled to end Sep. 21st)
- Graham's Project
- Ongoing Activities
- JLab Common Environment (CE) + SPACK
- SRO
- ERSAP
- Hall-B/D TriDAS
- JANA2
- GlueX port
- A.I. support
- A.I.
- ENP + CST Town Hall Meeting Friday 8/28 projects list
- GlueX-EIC-PANDA ML workshop Sep. 21-25
- EVIO-6
- Offsite Computing
- GUI for Calorimeter calibration scripts (Hall-D Request)
- Publications
- AOT
Minutes:
Attendees: David L.(chair), Carl T., Nathan B., Thomas B., Vardan G., Kishan R., Graham H.
Announcements
- Fortnight papers delayed 2 weeks due to NUG meeting today
- Nathan suggested adding internal code reviews to fortnight meetings
- Someone would provide piece of code and others would look over and comment
- Not intended to cause re-writes of code, but to share knowledge/experience/techniques
- Vardan suggested we consider adopting coding standards within group
- General consensus this would be a good idea, but should not be too restrictive
- Vardan will look into Java standards
- Nathan will look into C++ standards
- Kishan will look into Python standards
CODA
- Hall-A DAQ system experienced issues with crashes last week that looked related to AFECS (Vardan's code)
- Turned out to be due to incompatibility with certain minor revisions of JDK8 and some library dependencies
- Hall-A had recently updated their JDK which is why the problem suddenly emerged.
- Vardan identified the problem and advised them to change to JDK8 outside of the problematic minor revision range. They did so and are now running again.
CLARA
- Vardan received word of some (rare) crashes of Hall-B data production jobs
- Problem related to uncaught exception in one of the engines used in reconstruction
- Should be fixed in engine itself, but unclear why catch-all mechanism in orchestrator was not working. Vardan will look into it.
- Nathan Baltzell provided some information that may help pinpoint the exact problem, but this is happening rarely enough that it is not a show-stopper.
SRO
- Vardan and Chris L. have come up with some specs for the software stream source Chris is working on.
- Vardan looking at 2 other Data Lake packages (besides redis)
- Has one more configuration option for redis that he will try this week.
JANA2
- Nathan is working to look into some bug reports David made
- Some discussion on profilers for C++
- perf is useful on Linux, but not clear how to get the most user-friendly results
- CLion has built in tool that can produce graphs (uses perf underneath)
- VTune license available in PCSCI group, but not easily accessible.
- NERSC also has VTune license which could be used in interactive session. David will look into it
A.I.
- Kishan able to run model on GPU's on sciml190X computers (despite David's out-of-date documentation)
- Investigating using embedded python interpreter in JANA
- GIL (=Global Interpreter Lock) prevents efficient multi-threading this way
- A package pypy looks like it may provide better multi-threaded support. Kishan will look into it this week.
EVIO-6
- Final code cleanup underway.
- Carl will send detailed e-mail to Nathan this week so he can have a look and comment
Offsite Production
- NERSC production is still quite slow. Test using "regular" queue did not show dramatic increase in throughput.
- Successful PSC test run over the week. Will use to process as much as 25% of GlueX Spring 2020 data set.
- Some discussion on making JLab job submission system seamlessly run jobs onsite/offsite.
- Will require changing how allocation requests are done.
- Project is 1 year out.