Difference between revisions of "EPSCI Group Meeting Mar. 15, 2021"
Jump to navigation
Jump to search
(One intermediate revision by the same user not shown) | |||
Line 30: | Line 30: | ||
<!--------------------------------------------------------------------------------------------------> | <!--------------------------------------------------------------------------------------------------> | ||
=== Agenda: === | === Agenda: === | ||
− | |||
# [[EPSCI Group Meeting Mar. 8, 2021 | Previous meeting]] | # [[EPSCI Group Meeting Mar. 8, 2021 | Previous meeting]] | ||
#: | #: | ||
Line 55: | Line 54: | ||
#** JLab Common Environment (CE) + SPACK | #** JLab Common Environment (CE) + SPACK | ||
#*** ServiceNow [https://jlab.servicenowservices.com/nav_to.do?uri=%2Fincident.do%3Fsys_id%3D8443178a1b782450f0b4dc6ce54bcb80%26sysparm_record_target%3Dincident%26sysparm_record_row%3D2%26sysparm_record_rows%3D3%26sysparm_record_list%3Dactive%3Dtrue%5Ecaller_id%3Djavascript:gs.getUserID()%5EORu_affected_user%3Djavascript:gs.getUserID()%5EORwatch_listCONTAINSjavascript:gs.getUserID()%5EORDERBYDESCopened_at (mapmanager, fputil, fpack, bos, bankdef)] | #*** ServiceNow [https://jlab.servicenowservices.com/nav_to.do?uri=%2Fincident.do%3Fsys_id%3D8443178a1b782450f0b4dc6ce54bcb80%26sysparm_record_target%3Dincident%26sysparm_record_row%3D2%26sysparm_record_rows%3D3%26sysparm_record_list%3Dactive%3Dtrue%5Ecaller_id%3Djavascript:gs.getUserID()%5EORu_affected_user%3Djavascript:gs.getUserID()%5EORwatch_listCONTAINSjavascript:gs.getUserID()%5EORDERBYDESCopened_at (mapmanager, fputil, fpack, bos, bankdef)] | ||
− | #** ACTS | + | #*** '''HOMEWORK ASSIGNMENT:''' Everyone please read the [[SPACK Mirror on JLab CUE|Quickstart Instructions]] and test the system. Report any issues. |
+ | #** EIC | ||
+ | #*** ACTS | ||
+ | #*** Recent developments | ||
#** Offline frameworks (CLARA, JANA2) | #** Offline frameworks (CLARA, JANA2) | ||
#: | #: | ||
Line 85: | Line 87: | ||
=== Minutes: === | === Minutes: === | ||
− | + | Attendees: David L., Carl T., Nathan B., Kishan R., Vardan G., Thomas B., Mike G. | |
+ | |||
+ | * Announcements | ||
+ | ** Safety Meeting Last week | ||
+ | *** Criteria for reopening depends on covid infection rate for primary municipalities that JLab employees reside in | ||
+ | *** Unclear what telework policy will be after reopening. | ||
+ | * CST Division meeting today at 3pm | ||
+ | |||
+ | * SPACK | ||
+ | ** Plan to roll out April 1st | ||
+ | ** Homework assigned to all EPSCI members to test quickstart instructions and report any issues | ||
+ | |||
+ | * ACTS | ||
+ | ** Big push last week to get tracking working. Still some issues with everything working completely. | ||
+ | ** Some agreement on merging efforts with ANL was reached over weekend. This includes JANA2 as part of the software stack. | ||
+ | |||
+ | * JANA2 | ||
+ | ** Pull request merged that improved support for Python API | ||
+ | |||
+ | * CLARA | ||
+ | ** Anticipating release of JDK 16 which should have several significant imrpovements | ||
+ | |||
+ | * Data Transport | ||
+ | ** Met with Fast electronics Group last week to discuss project | ||
+ | ** Meeting scheduled for Wed. this week with both JLab FEE group and ESnet people | ||
+ | ** Working to reproduce some tests William Gu did with FPGA cards in INDRA lab | ||
+ | |||
+ | * SRO | ||
+ | ** New VTP firmware release on Friday to address some dropped frame issues | ||
+ | ** Vardan ran test overnight Saturday and sent results to Dave A. | ||
+ | *** Things looking better. No more 10% spikes in dropped frame rate | ||
+ | **** Remaining (smaller) spikes are still mysterious since they are in sync with the two nominally asynchronous streams. | ||
+ | ** ERSAP VTP C++ code compiled by Carl on indras1 and is now available for testing | ||
+ | |||
+ | * CODA | ||
+ | ** Carl has been working with Dave A. to get it to compile with jana JDK 15 | ||
+ | ** Working to "spackify" CODA | ||
+ | ** Some libraries used by CODA components (most notably jcedit) have been abandoned in the newest Java JDK | ||
+ | *** These are now being treated as 3rd party libraries | ||
+ | *** Some failures to get everything to compile on RHEL6 which is still used on some test machines in DAQ lab | ||
+ | **** Vardan will spend a little more time on it but if things don't come together in a reasonable amount of time he will suggest making CODA 3.11 be "RHEL7 and above" | ||
+ | |||
+ | * AI | ||
+ | ** GPU nodes | ||
+ | *** Plan modified due to vendor option for GPU nodes being cost ineffective for single GPU nodes | ||
+ | *** New plan is to buy a couple of GPU loaded nodes and use SLURM to allocate jobs with single GPU | ||
+ | *** David expressed some concern over getting the right CPU/GPU balance to match the jobs | ||
+ | *** Bryan Hess scheduled meeting for tomorrow to discuss. | ||
+ | ** Jupyterhub | ||
+ | *** Looks like the documentation on custom kernels may need to be updated | ||
+ | *** Some issues remain with getting Tensorflow to recognize GPU in epsci-notebook | ||
+ | *** Service Now request in and Wes is working on it. David will reach out to check status. | ||
+ | ** AI Experimental Controls | ||
+ | *** Torri Jeske has accepted the Postdoc position and will start April 1st | ||
+ | *** Thomas has continued looking into project in order to form roadmap | ||
+ | |||
+ | * OSG | ||
+ | ** GlueX has been using scosg16 node for some time | ||
+ | ** Issue last week with node failing due to overload from too many jobs after large submit from CLAS | ||
+ | ** A cap was put in place by CLAS while back in response to a similar issue but apparently was not active for "test" jobs. | ||
+ | ** More administrative controls put in place and there is some exploration being done for more engineered controls at the scheduler level to prevent DOS type meltdowns in such circumstances. |
Latest revision as of 16:01, 15 March 2021
The meeting time is 10:00am.
Connection Info:
You can connect using BlueJeans Video conferencing (ID: 253 300 597). (Click "Expand" to the right for details -->):
Meeting URL https://bluejeans.com/253300597?src=join_info Meeting ID 253 300 597 Want to dial in from a phone? Dial one of the following numbers: +1.888.240.2560 (US Toll Free) (see all numbers - https://www.bluejeans.com/premium-numbers) Enter the meeting ID and passcode followed by # Connecting from a room system? Dial: bjn.vc or 199.48.152.152 and enter your meeting ID & passcode
Agenda:
- Previous meeting
- Announcements
- Conferences and Workshops
- Workshop: CFNS-ANL Joint Workshop on Instrumenting the 2nd IR at the EIC (Mar. 21-24)
- SEA'S IMPROVING SCIENTIFIC SOFTWARE CONFERENCE AND TUTORIALS 2021 (Mar. 22-26)
- Vardan: Streaming data processing from multiple satellite data sets under the NASA/GEWEX SRB project. (Mar. 26 @ 1pm)
- Autonomous Discovery in Science and Engineering workshop (April 20-22)
- vCHEP2021 (May 17-21)
- Thomas, Kishan: Hydra
- Vardan, Nathan, David (+Hall-B, Fast Electronics, and TriDAS groups): TriDAS + JANA2 SRO
- David: HOSS!
- ACAT2021 (Nov. 29 - Dec. 3)
- Ongoing Activities
- Scientific Software support
- JLab Common Environment (CE) + SPACK
- ServiceNow (mapmanager, fputil, fpack, bos, bankdef)
- HOMEWORK ASSIGNMENT: Everyone please read the Quickstart Instructions and test the system. Report any issues.
- EIC
- ACTS
- Recent developments
- Offline frameworks (CLARA, JANA2)
- JLab Common Environment (CE) + SPACK
- Data Transport
- DAQ systems
- SRO
- SAMPA + ERSAP + JANA2 + INDRA-ASTRA = April 1st
- CODA (CODA3 support, EVIO-6)
- SRO
- A.I.
- GPU purchase for ENP
- Jupyterhub
- A.I.I. : Feb. 3, 2021 A.I.I. Planning
- Experimental Controls
- Experiment Support
- EIC (EIC Software Expression of Interest meeting 1/27)
- CPP
- SOLID
- Offsite Computing
- NERSC, PSC, IU
- OSG
- Scientific Software support
- AOT
Minutes:
Attendees: David L., Carl T., Nathan B., Kishan R., Vardan G., Thomas B., Mike G.
- Announcements
- Safety Meeting Last week
- Criteria for reopening depends on covid infection rate for primary municipalities that JLab employees reside in
- Unclear what telework policy will be after reopening.
- Safety Meeting Last week
- CST Division meeting today at 3pm
- SPACK
- Plan to roll out April 1st
- Homework assigned to all EPSCI members to test quickstart instructions and report any issues
- ACTS
- Big push last week to get tracking working. Still some issues with everything working completely.
- Some agreement on merging efforts with ANL was reached over weekend. This includes JANA2 as part of the software stack.
- JANA2
- Pull request merged that improved support for Python API
- CLARA
- Anticipating release of JDK 16 which should have several significant imrpovements
- Data Transport
- Met with Fast electronics Group last week to discuss project
- Meeting scheduled for Wed. this week with both JLab FEE group and ESnet people
- Working to reproduce some tests William Gu did with FPGA cards in INDRA lab
- SRO
- New VTP firmware release on Friday to address some dropped frame issues
- Vardan ran test overnight Saturday and sent results to Dave A.
- Things looking better. No more 10% spikes in dropped frame rate
- Remaining (smaller) spikes are still mysterious since they are in sync with the two nominally asynchronous streams.
- Things looking better. No more 10% spikes in dropped frame rate
- ERSAP VTP C++ code compiled by Carl on indras1 and is now available for testing
- CODA
- Carl has been working with Dave A. to get it to compile with jana JDK 15
- Working to "spackify" CODA
- Some libraries used by CODA components (most notably jcedit) have been abandoned in the newest Java JDK
- These are now being treated as 3rd party libraries
- Some failures to get everything to compile on RHEL6 which is still used on some test machines in DAQ lab
- Vardan will spend a little more time on it but if things don't come together in a reasonable amount of time he will suggest making CODA 3.11 be "RHEL7 and above"
- AI
- GPU nodes
- Plan modified due to vendor option for GPU nodes being cost ineffective for single GPU nodes
- New plan is to buy a couple of GPU loaded nodes and use SLURM to allocate jobs with single GPU
- David expressed some concern over getting the right CPU/GPU balance to match the jobs
- Bryan Hess scheduled meeting for tomorrow to discuss.
- Jupyterhub
- Looks like the documentation on custom kernels may need to be updated
- Some issues remain with getting Tensorflow to recognize GPU in epsci-notebook
- Service Now request in and Wes is working on it. David will reach out to check status.
- AI Experimental Controls
- Torri Jeske has accepted the Postdoc position and will start April 1st
- Thomas has continued looking into project in order to form roadmap
- GPU nodes
- OSG
- GlueX has been using scosg16 node for some time
- Issue last week with node failing due to overload from too many jobs after large submit from CLAS
- A cap was put in place by CLAS while back in response to a similar issue but apparently was not active for "test" jobs.
- More administrative controls put in place and there is some exploration being done for more engineered controls at the scheduler level to prevent DOS type meltdowns in such circumstances.