Difference between revisions of "EPSCI Group Meeting Aug. 17, 2020"
Jump to navigation
Jump to search
(3 intermediate revisions by the same user not shown) | |||
Line 39: | Line 39: | ||
#* [[Fortnight Papers]] -> Moved to alternating Mondays (opposite SRO) | #* [[Fortnight Papers]] -> Moved to alternating Mondays (opposite SRO) | ||
#** Delayed yet another 2 weeks due to NUG meeting | #** Delayed yet another 2 weeks due to NUG meeting | ||
− | #* [https://www.nersc.gov/users/NUG/annual-meetings/nug-2020/ NERSC User's Group Meeting 8/17/2020 @ | + | #* [https://www.nersc.gov/users/NUG/annual-meetings/nug-2020/ NERSC User's Group Meeting 8/17/2020 @ 12am-7pm] |
# Graham's Project | # Graham's Project | ||
# Ongoing Activities | # Ongoing Activities | ||
Line 54: | Line 54: | ||
#* EVIO-6 | #* EVIO-6 | ||
#* Offsite Computing | #* Offsite Computing | ||
− | #** [https://halldweb.jlab.org/data_monitoring/recon/summary_swif2_output_recon_2019- | + | #** NERSC ([https://halldweb.jlab.org/data_monitoring/recon/summary_swif2_output_recon_2019-11_ver01_batch01/index.html flex],[https://halldweb.jlab.org/data_monitoring/recon/summary_swif2_output_recon_2019-11_ver01_batch01b/index.html regular]), PSC |
#** OSG | #** OSG | ||
# GUI for Calorimeter calibration scripts (Hall-D Request) | # GUI for Calorimeter calibration scripts (Hall-D Request) | ||
Line 64: | Line 64: | ||
<hr> | <hr> | ||
=== Minutes: === | === Minutes: === | ||
− | + | Attendees: David L.(chair), Carl T., Nathan B., Thomas B., Vardan G., Kishan R., Graham H. | |
+ | |||
+ | '''Announcements''' | ||
+ | * Fortnight papers delayed 2 weeks due to NUG meeting today | ||
+ | * Nathan suggested adding internal code reviews to fortnight meetings | ||
+ | ** Someone would provide piece of code and others would look over and comment | ||
+ | ** Not intended to cause re-writes of code, but to share knowledge/experience/techniques | ||
+ | * Vardan suggested we consider adopting coding standards within group | ||
+ | ** General consensus this would be a good idea, but should not be too restrictive | ||
+ | ** Vardan will look into Java standards | ||
+ | ** Nathan will look into C++ standards | ||
+ | ** Kishan will look into Python standards | ||
+ | |||
+ | '''CODA''' | ||
+ | * Hall-A DAQ system experienced issues with crashes last week that looked related to AFECS (Vardan's code) | ||
+ | * Turned out to be due to incompatibility with certain minor revisions of JDK8 and some library dependencies | ||
+ | * Hall-A had recently updated their JDK which is why the problem suddenly emerged. | ||
+ | * Vardan identified the problem and advised them to change to JDK8 outside of the problematic minor revision range. They did so and are now running again. | ||
+ | |||
+ | '''CLARA''' | ||
+ | * Vardan received word of some (rare) crashes of Hall-B data production jobs | ||
+ | * Problem related to uncaught exception in one of the engines used in reconstruction | ||
+ | * Should be fixed in engine itself, but unclear why catch-all mechanism in orchestrator was not working. Vardan will look into it. | ||
+ | * Nathan Baltzell provided some information that may help pinpoint the exact problem, but this is happening rarely enough that it is not a show-stopper. | ||
+ | |||
+ | '''SRO''' | ||
+ | * Vardan and Chris L. have come up with some specs for the software stream source Chris is working on. | ||
+ | * Vardan looking at 2 other Data Lake packages (besides redis) | ||
+ | ** Has one more configuration option for redis that he will try this week. | ||
+ | |||
+ | '''JANA2''' | ||
+ | * Nathan is working to look into some bug reports David made | ||
+ | * Some discussion on profilers for C++ | ||
+ | ** perf is useful on Linux, but not clear how to get the most user-friendly results | ||
+ | ** CLion has built in tool that can produce graphs (uses perf underneath) | ||
+ | ** VTune license available in PCSCI group, but not easily accessible. | ||
+ | *** NERSC also has VTune license which could be used in interactive session. David will look into it | ||
+ | |||
+ | '''A.I.''' | ||
+ | * Kishan able to run model on GPU's on sciml190X computers (despite David's out-of-date documentation) | ||
+ | * Investigating using embedded python interpreter in JANA | ||
+ | ** GIL (=Global Interpreter Lock) prevents efficient multi-threading this way | ||
+ | ** A package pypy looks like it may provide better multi-threaded support. Kishan will look into it this week. | ||
+ | |||
+ | '''EVIO-6''' | ||
+ | * Final code cleanup underway. | ||
+ | * Carl will send detailed e-mail to Nathan this week so he can have a look and comment | ||
+ | |||
+ | '''Offsite Production''' | ||
+ | * NERSC production is still quite slow. Test using "regular" queue did not show dramatic increase in throughput. | ||
+ | * Successful PSC test run over the week. Will use to process as much as 25% of GlueX Spring 2020 data set. | ||
+ | * Some discussion on making JLab job submission system seamlessly run jobs onsite/offsite. | ||
+ | ** Will require changing how allocation requests are done. | ||
+ | ** Project is 1 year out. |
Latest revision as of 15:18, 17 August 2020
The meeting time is 10:00am.
Connection Info:
You can connect using BlueJeans Video conferencing (ID: 253 300 597). (Click "Expand" to the right for details -->):
Meeting URL https://bluejeans.com/253300597?src=join_info Meeting ID 253 300 597 Want to dial in from a phone? Dial one of the following numbers: +1.888.240.2560 (US Toll Free) (see all numbers - https://www.bluejeans.com/premium-numbers) Enter the meeting ID and passcode followed by # Connecting from a room system? Dial: bjn.vc or 199.48.152.152 and enter your meeting ID & passcode
Agenda:
- Previous meeting
- Announcements
- Beam delivery for physics ongoing (scheduled to end Sep. 21st)
- David on shift 8/18, 8/19, 8/28, 8/29, 9/11, 9/12
- Thomas on shift 8/20, 8/21, 8/30, 8/31, 9/9, 9/10
- Streaming Data Scientist job posted!
- Back to school etc. memo
- Fortnight Papers -> Moved to alternating Mondays (opposite SRO)
- Delayed yet another 2 weeks due to NUG meeting
- NERSC User's Group Meeting 8/17/2020 @ 12am-7pm
- Beam delivery for physics ongoing (scheduled to end Sep. 21st)
- Graham's Project
- Ongoing Activities
- JLab Common Environment (CE) + SPACK
- SRO
- ERSAP
- Hall-B/D TriDAS
- JANA2
- GlueX port
- A.I. support
- A.I.
- ENP + CST Town Hall Meeting Friday 8/28 projects list
- GlueX-EIC-PANDA ML workshop Sep. 21-25
- EVIO-6
- Offsite Computing
- GUI for Calorimeter calibration scripts (Hall-D Request)
- Publications
- AOT
Minutes:
Attendees: David L.(chair), Carl T., Nathan B., Thomas B., Vardan G., Kishan R., Graham H.
Announcements
- Fortnight papers delayed 2 weeks due to NUG meeting today
- Nathan suggested adding internal code reviews to fortnight meetings
- Someone would provide piece of code and others would look over and comment
- Not intended to cause re-writes of code, but to share knowledge/experience/techniques
- Vardan suggested we consider adopting coding standards within group
- General consensus this would be a good idea, but should not be too restrictive
- Vardan will look into Java standards
- Nathan will look into C++ standards
- Kishan will look into Python standards
CODA
- Hall-A DAQ system experienced issues with crashes last week that looked related to AFECS (Vardan's code)
- Turned out to be due to incompatibility with certain minor revisions of JDK8 and some library dependencies
- Hall-A had recently updated their JDK which is why the problem suddenly emerged.
- Vardan identified the problem and advised them to change to JDK8 outside of the problematic minor revision range. They did so and are now running again.
CLARA
- Vardan received word of some (rare) crashes of Hall-B data production jobs
- Problem related to uncaught exception in one of the engines used in reconstruction
- Should be fixed in engine itself, but unclear why catch-all mechanism in orchestrator was not working. Vardan will look into it.
- Nathan Baltzell provided some information that may help pinpoint the exact problem, but this is happening rarely enough that it is not a show-stopper.
SRO
- Vardan and Chris L. have come up with some specs for the software stream source Chris is working on.
- Vardan looking at 2 other Data Lake packages (besides redis)
- Has one more configuration option for redis that he will try this week.
JANA2
- Nathan is working to look into some bug reports David made
- Some discussion on profilers for C++
- perf is useful on Linux, but not clear how to get the most user-friendly results
- CLion has built in tool that can produce graphs (uses perf underneath)
- VTune license available in PCSCI group, but not easily accessible.
- NERSC also has VTune license which could be used in interactive session. David will look into it
A.I.
- Kishan able to run model on GPU's on sciml190X computers (despite David's out-of-date documentation)
- Investigating using embedded python interpreter in JANA
- GIL (=Global Interpreter Lock) prevents efficient multi-threading this way
- A package pypy looks like it may provide better multi-threaded support. Kishan will look into it this week.
EVIO-6
- Final code cleanup underway.
- Carl will send detailed e-mail to Nathan this week so he can have a look and comment
Offsite Production
- NERSC production is still quite slow. Test using "regular" queue did not show dramatic increase in throughput.
- Successful PSC test run over the week. Will use to process as much as 25% of GlueX Spring 2020 data set.
- Some discussion on making JLab job submission system seamlessly run jobs onsite/offsite.
- Will require changing how allocation requests are done.
- Project is 1 year out.