Difference between revisions of "EPSCI Group Meeting Apr. 6, 2021"
Jump to navigation
Jump to search
(One intermediate revision by the same user not shown) | |||
Line 54: | Line 54: | ||
#** JLab Common Environment (CE) + SPACK | #** JLab Common Environment (CE) + SPACK | ||
#*** EPSCI are now responsible for ROOT builds on CUE | #*** EPSCI are now responsible for ROOT builds on CUE | ||
− | #*** CentOS8 support | + | #*** CentOS8 support |
#*** ServiceNow [https://jlab.servicenowservices.com/nav_to.do?uri=%2Fincident.do%3Fsys_id%3D8443178a1b782450f0b4dc6ce54bcb80%26sysparm_record_target%3Dincident%26sysparm_record_row%3D2%26sysparm_record_rows%3D3%26sysparm_record_list%3Dactive%3Dtrue%5Ecaller_id%3Djavascript:gs.getUserID()%5EORu_affected_user%3Djavascript:gs.getUserID()%5EORwatch_listCONTAINSjavascript:gs.getUserID()%5EORDERBYDESCopened_at (mapmanager, fputil, fpack, bos, bankdef)] | #*** ServiceNow [https://jlab.servicenowservices.com/nav_to.do?uri=%2Fincident.do%3Fsys_id%3D8443178a1b782450f0b4dc6ce54bcb80%26sysparm_record_target%3Dincident%26sysparm_record_row%3D2%26sysparm_record_rows%3D3%26sysparm_record_list%3Dactive%3Dtrue%5Ecaller_id%3Djavascript:gs.getUserID()%5EORu_affected_user%3Djavascript:gs.getUserID()%5EORwatch_listCONTAINSjavascript:gs.getUserID()%5EORDERBYDESCopened_at (mapmanager, fputil, fpack, bos, bankdef)] | ||
#** EIC | #** EIC | ||
#*** Collaboration with ANL | #*** Collaboration with ANL | ||
− | #**** | + | #**** Gaudi -> JANA2 |
#*** ACTS | #*** ACTS | ||
#** Offline frameworks (CLARA, JANA2) | #** Offline frameworks (CLARA, JANA2) | ||
Line 110: | Line 110: | ||
=== Minutes: === | === Minutes: === | ||
− | + | Attendees: David L., Carl T., Nathan B., Kishan R., Vardan G., Thomas B., Mike G., Torri J. | |
+ | |||
+ | * SPACK | ||
+ | ** Still some work needed for fully functional deployment | ||
+ | ** EPSCI has now taken over responsibility for building ROOT on CUE from Bob Michaels | ||
+ | *** Need some testing procedure to verify builds since it is more important than most software packages | ||
+ | ** CentOS8 has very limited support dates | ||
+ | *** We should drop spack support for CentOS8 and replace it with another OS based on what SciComp Ops is thinking | ||
+ | |||
+ | * EIC | ||
+ | ** Met w/ Dmitry last week to discuss merging of efforts with ANL | ||
+ | *** Nathan looking at clarifying scope of project to convert ANL code from GAUDI to JANA2 | ||
+ | *** Discussed need for additional personpower for supporting this effort. Request sent to upper management | ||
+ | ** ACTS | ||
+ | *** Nathan working on implementing ACTS examples with JANA2 to learn more about system. | ||
+ | |||
+ | * JANA2 | ||
+ | ** Nathan working on integrating with CLARA as a microservice | ||
+ | *** Some differences with basic data/execution flow between JANA and CLARA that need to be worked out | ||
+ | |||
+ | * CLARA | ||
+ | ** Issue with occasional (<1%) of files being truncated when processing multiple files | ||
+ | ** With Raphaella's help, ran ~100 farm jobs and was able to decipher cause from log files. | ||
+ | ** Issue had to do with lost synchronization for one thread and a subsequent thread launched to process next file in list killed thread where original issue developed, masking it. | ||
+ | |||
+ | * Data Transport -> EJFAT | ||
+ | ** EJFAT = ESnet/JLab + FPGA + Accelerated Transport (pronounced "Edge Fat" = fat data pipe from the edge) | ||
+ | ** Meeting today to discuss data format | ||
+ | |||
+ | * SRO | ||
+ | ** Monday meeting had only a few participants and technical issues prevented lots of discussion | ||
+ | ** Some discussion of EVIO format of transient data (Dave A., Carl, T., Vardan G.) | ||
+ | ** Another test run by Vardan using software source: | ||
+ | *** 12GB RAM, 15 cores, 2.2GB/s | ||
+ | ** Some work with object pools | ||
+ | ** David challenged Carl to learn how to reproduce Vardan's performance tests independently | ||
+ | |||
+ | * CODA | ||
+ | ** Carl continues work on EVIO-6 event viewer GUI | ||
+ | ** two minor user requests: | ||
+ | *** More verbose info from user scripts run during transitions | ||
+ | *** Support for setting more environmental variables in COOL | ||
+ | |||
+ | * AI | ||
+ | ** FOAs + LDRD | ||
+ | *** Many discussions last week. We have potential involvement in several. Primary authorship on 1. | ||
+ | *** Thomas working on LDRD proposal to support work related to Early Career Award | ||
+ | ** Jupyterhub | ||
+ | *** Kishan tested running training on GPU via Jupterhub and the epsci-notebook. Some library errors. | ||
+ | **** Communication with Wes led to adding secret libs directory (.singularity.d/libs) to LD_LIBRARY_PATH coupled with CUDA installation in /apps formed working system. | ||
+ | ** Experimental Controls | ||
+ | *** Torri met with Noami yesterday who pointer her to some software and gave tour of DB. | ||
+ | *** Able to run plugins over raw data and generate ROOT files. Next step is to examine contents. | ||
+ | |||
+ | * Offsite Computing | ||
+ | ** GlueX is working on revised XSEDE proposal (Due April, 15th) | ||
+ | ** OSG | ||
+ | *** Job queue has been steadily catching up since removing lustre mounts from scosg16 | ||
+ | *** Some issues have arisen in the last day that looked to have caused a slow down. They are being investigated. | ||
+ | *** Changes made to monitoring that make it appear as though it is updating faster |
Latest revision as of 16:50, 6 April 2021
The meeting time is 10:00am.
Connection Info:
You can connect using BlueJeans Video conferencing (ID: 253 300 597). (Click "Expand" to the right for details -->):
Meeting URL https://bluejeans.com/253300597?src=join_info Meeting ID 253 300 597 Want to dial in from a phone? Dial one of the following numbers: +1.888.240.2560 (US Toll Free) (see all numbers - https://www.bluejeans.com/premium-numbers) Enter the meeting ID and passcode followed by # Connecting from a room system? Dial: bjn.vc or 199.48.152.152 and enter your meeting ID & passcode
Agenda:
- Previous meeting
- Announcements
- Welcome to Torri Jeske
- David gone Thursday 4/8 through Friday 4/16 (back on Monday 4/19)
- No EPSCI Group meeting Monday April 12th
- Fortnight paper for May. 1st: HEP-Frame: Improving the efficiency of pipelined data transformation & filtering for scientific analyses (delayed?)
- Conferences and Workshops
- Autonomous Discovery in Science and Engineering workshop (April 20-22)
- vCHEP2021 (May 17-21)
- Thomas, Kishan: Hydra
- Vardan, Nathan, David (+Hall-B, Fast Electronics, and TriDAS groups): TriDAS + JANA2 SRO
- David: HOSS!
- ACAT2021 (Nov. 29 - Dec. 3)
- Ongoing Activities
- Scientific Software support
- JLab Common Environment (CE) + SPACK
- EPSCI are now responsible for ROOT builds on CUE
- CentOS8 support
- ServiceNow (mapmanager, fputil, fpack, bos, bankdef)
- EIC
- Collaboration with ANL
- Gaudi -> JANA2
- ACTS
- Collaboration with ANL
- Offline frameworks (CLARA, JANA2)
- JLab Common Environment (CE) + SPACK
- Data Transport
- Meeting with ESnet this afternoon
- Status of proposal
- DAQ systems
- SRO
- SAMPA + ERSAP + JANA2 + INDRA-ASTRA = April 1st + 2 weeks
- CODA (CODA3 support, EVIO-6)
- SRO
- A.I.
- Multiple FOAs + JLab LDRD
- Collaboration with Theory on MCGen project DE-FOA-0002493
- Collaboration with BNL on AI scheduling DE-FOA-0002482
- Collaboration with INDRA-ASTRA
- Collaboration with Sergey F. on AI + FPGA
- Surrogate Models proposal (NP, ASCR, LDRD?)
- Amplitude Analysis Inverse Problem (LDRD)
- Jupyterhub + GPU
- Experimental Controls
- Multiple FOAs + JLab LDRD
- Offsite Computing
- NERSC, PSC, IU
- XSEDE application for PSC bridges-2 being updated for resubmission (due April 15th)
- OSG
- NERSC, PSC, IU
- Scientific Software support
- AOT
Message from Bob Michaels officially handing over ROOT responsibilities to EPSCI (Click "Expand" to the right for details -->):
BTW, I'm officially passing this job (building and maintaining ROOT) to you, now, David. If you need some help, let me know. Of course, I can answer questions and help resolve problems with the old builds. yours Bob Dr. Robert Michaels Staff Scientist, Jefferson Lab
Minutes:
Attendees: David L., Carl T., Nathan B., Kishan R., Vardan G., Thomas B., Mike G., Torri J.
- SPACK
- Still some work needed for fully functional deployment
- EPSCI has now taken over responsibility for building ROOT on CUE from Bob Michaels
- Need some testing procedure to verify builds since it is more important than most software packages
- CentOS8 has very limited support dates
- We should drop spack support for CentOS8 and replace it with another OS based on what SciComp Ops is thinking
- EIC
- Met w/ Dmitry last week to discuss merging of efforts with ANL
- Nathan looking at clarifying scope of project to convert ANL code from GAUDI to JANA2
- Discussed need for additional personpower for supporting this effort. Request sent to upper management
- ACTS
- Nathan working on implementing ACTS examples with JANA2 to learn more about system.
- Met w/ Dmitry last week to discuss merging of efforts with ANL
- JANA2
- Nathan working on integrating with CLARA as a microservice
- Some differences with basic data/execution flow between JANA and CLARA that need to be worked out
- Nathan working on integrating with CLARA as a microservice
- CLARA
- Issue with occasional (<1%) of files being truncated when processing multiple files
- With Raphaella's help, ran ~100 farm jobs and was able to decipher cause from log files.
- Issue had to do with lost synchronization for one thread and a subsequent thread launched to process next file in list killed thread where original issue developed, masking it.
- Data Transport -> EJFAT
- EJFAT = ESnet/JLab + FPGA + Accelerated Transport (pronounced "Edge Fat" = fat data pipe from the edge)
- Meeting today to discuss data format
- SRO
- Monday meeting had only a few participants and technical issues prevented lots of discussion
- Some discussion of EVIO format of transient data (Dave A., Carl, T., Vardan G.)
- Another test run by Vardan using software source:
- 12GB RAM, 15 cores, 2.2GB/s
- Some work with object pools
- David challenged Carl to learn how to reproduce Vardan's performance tests independently
- CODA
- Carl continues work on EVIO-6 event viewer GUI
- two minor user requests:
- More verbose info from user scripts run during transitions
- Support for setting more environmental variables in COOL
- AI
- FOAs + LDRD
- Many discussions last week. We have potential involvement in several. Primary authorship on 1.
- Thomas working on LDRD proposal to support work related to Early Career Award
- Jupyterhub
- Kishan tested running training on GPU via Jupterhub and the epsci-notebook. Some library errors.
- Communication with Wes led to adding secret libs directory (.singularity.d/libs) to LD_LIBRARY_PATH coupled with CUDA installation in /apps formed working system.
- Kishan tested running training on GPU via Jupterhub and the epsci-notebook. Some library errors.
- Experimental Controls
- Torri met with Noami yesterday who pointer her to some software and gave tour of DB.
- Able to run plugins over raw data and generate ROOT files. Next step is to examine contents.
- FOAs + LDRD
- Offsite Computing
- GlueX is working on revised XSEDE proposal (Due April, 15th)
- OSG
- Job queue has been steadily catching up since removing lustre mounts from scosg16
- Some issues have arisen in the last day that looked to have caused a slow down. They are being investigated.
- Changes made to monitoring that make it appear as though it is updating faster