Difference between revisions of "EPSCI Group Meeting Apr. 6, 2021"

From epsciwiki
Jump to navigation Jump to search
(Created page with " The meeting time is 10:00am. === Connection Info: === <div class="toccolours mw-collapsible mw-collapsed"> You can connect using [https://bluejeans.com/253300597 BlueJeans V...")
 
 
(2 intermediate revisions by the same user not shown)
Line 34: Line 34:
 
#:
 
#:
 
# Announcements
 
# Announcements
#* Torri Jeske starts this week!
+
#* Welcome to Torri Jeske
#* David on vacation Friday 4/2 through Monday 4/5
+
#* David gone Thursday 4/8 through Friday 4/16 (back on Monday 4/19)
#** Shift EPSCI Group meeting to Tuesday 4/6 at 10?
+
#** No EPSCI Group meeting Monday April 12th
#* Welcome to Amitoj G Singh in SciComp Fri. at 2pm ([https://bluejeans.com/514950231?src=calendarLink bluejeans])
+
#* [[Fortnight Papers|Fortnight paper]] for May. 1st: [https://www.sciencedirect.com/science/article/abs/pii/S0010465521000151 HEP-Frame: Improving the efficiency of pipelined data transformation & filtering for scientific analyses] (delayed?)
#* [[Fortnight Papers|Fortnight paper]] for Mar. 29: [https://www.sciencedirect.com/science/article/abs/pii/S0010465521000151 HEP-Frame: Improving the efficiency of pipelined data transformation & filtering for scientific analyses] (delayed?)
 
 
#:
 
#:
 
#:
 
#:
 
# Conferences and Workshops
 
# Conferences and Workshops
#* <s>[https://sea.ucar.edu/conference/2021 SEA'S IMPROVING SCIENTIFIC SOFTWARE CONFERENCE AND TUTORIALS 2021] (Mar. 22-26)</s>
 
#** <s>Vardan: Streaming data processing from multiple satellite data sets under the NASA/GEWEX SRB project. (Mar. 26 @ 1pm)</s>
 
 
#* [https://autonomous-discovery.lbl.gov/ Autonomous Discovery in Science and Engineering workshop] (April 20-22)
 
#* [https://autonomous-discovery.lbl.gov/ Autonomous Discovery in Science and Engineering workshop] (April 20-22)
 
#* [https://indico.cern.ch/event/948465/page/21488-bulletin-1 vCHEP2021] (May 17-21)
 
#* [https://indico.cern.ch/event/948465/page/21488-bulletin-1 vCHEP2021] (May 17-21)
Line 56: Line 53:
 
#* Scientific Software support
 
#* Scientific Software support
 
#** JLab Common Environment (CE) + SPACK
 
#** JLab Common Environment (CE) + SPACK
#*** Deployment this week will not be possible
+
#*** EPSCI are now responsible for ROOT builds on CUE
 +
#*** CentOS8 support
 
#*** ServiceNow [https://jlab.servicenowservices.com/nav_to.do?uri=%2Fincident.do%3Fsys_id%3D8443178a1b782450f0b4dc6ce54bcb80%26sysparm_record_target%3Dincident%26sysparm_record_row%3D2%26sysparm_record_rows%3D3%26sysparm_record_list%3Dactive%3Dtrue%5Ecaller_id%3Djavascript:gs.getUserID()%5EORu_affected_user%3Djavascript:gs.getUserID()%5EORwatch_listCONTAINSjavascript:gs.getUserID()%5EORDERBYDESCopened_at (mapmanager, fputil, fpack, bos, bankdef)]
 
#*** ServiceNow [https://jlab.servicenowservices.com/nav_to.do?uri=%2Fincident.do%3Fsys_id%3D8443178a1b782450f0b4dc6ce54bcb80%26sysparm_record_target%3Dincident%26sysparm_record_row%3D2%26sysparm_record_rows%3D3%26sysparm_record_list%3Dactive%3Dtrue%5Ecaller_id%3Djavascript:gs.getUserID()%5EORu_affected_user%3Djavascript:gs.getUserID()%5EORwatch_listCONTAINSjavascript:gs.getUserID()%5EORDERBYDESCopened_at (mapmanager, fputil, fpack, bos, bankdef)]
 
#** EIC
 
#** EIC
 +
#*** Collaboration with ANL
 +
#**** Gaudi -> JANA2
 
#*** ACTS
 
#*** ACTS
#*** Collaboration with ANL
 
 
#** Offline frameworks (CLARA, JANA2)
 
#** Offline frameworks (CLARA, JANA2)
 
#:
 
#:
 
#* Data Transport
 
#* Data Transport
#**
+
#** Meeting with ESnet this afternoon
 +
#** Status of proposal
 
#:
 
#:
 
#* DAQ systems
 
#* DAQ systems
Line 72: Line 72:
 
#:
 
#:
 
#* A.I.
 
#* A.I.
#** Multiple [https://docs.google.com/presentation/d/1lYenr970yuYyzvPz8MXb_pmHxvB8GX2DLPn6Lny8T1U/edit?usp=sharing FOAs]
+
#** Multiple [https://docs.google.com/presentation/d/1lYenr970yuYyzvPz8MXb_pmHxvB8GX2DLPn6Lny8T1U/edit?usp=sharing FOAs] + JLab LDRD
#*** Numerous ideas and collaborative efforts being discussed
+
#*** Collaboration with Theory on MCGen project [https://science.osti.gov/-/media/grants/pdf/foas/2021/SC_FOA_0002493.pdf DE-FOA-0002493]
#*** May require some assistance over coming weeks (multiple proposals?)
+
#*** Collaboration with BNL on AI scheduling [https://science.osti.gov/-/media/grants/pdf/foas/2021/SC_FOA_0002493.pdf DE-FOA-0002482]
#** GPU purchase for ENP
+
#*** Collaboration with INDRA-ASTRA
#** Hydra paper
+
#*** Collaboration with Sergey F. on AI + FPGA
 +
#*** Surrogate Models proposal (NP, ASCR, LDRD?)
 +
#*** Amplitude Analysis Inverse Problem (LDRD)
 
#** Jupyterhub + GPU
 
#** Jupyterhub + GPU
 
#** Experimental Controls
 
#** Experimental Controls
Line 86: Line 88:
 
#** OSG
 
#** OSG
 
# AOT
 
# AOT
 +
 +
<div class="toccolours mw-collapsible mw-collapsed">
 +
Message from Bob Michaels officially handing over ROOT responsibilities to EPSCI <font size="-3">(Click "Expand" to the right for details -->):</font>
 +
<div class="mw-collapsible-content">
 +
<pre>
 +
  BTW, I'm officially passing this job (building and maintaining ROOT) to you, now, David.
 +
  If you need some help, let me know.  Of course, I can answer questions and help resolve
 +
  problems with the old builds.
 +
 +
  yours
 +
  Bob
 +
 +
  Dr. Robert Michaels
 +
  Staff Scientist, Jefferson Lab
 +
</pre>
 +
</div>
 +
</div>
  
 
<hr>
 
<hr>
Line 91: Line 110:
 
=== Minutes: ===
 
=== Minutes: ===
  
<!-- Attendees: David L., Carl T., Nathan B., Kishan R., Vardan G., Thomas B., Mike G. -->
+
Attendees: David L., Carl T., Nathan B., Kishan R., Vardan G., Thomas B., Mike G., Torri J.
 +
 
 +
* SPACK
 +
** Still some work needed for fully functional deployment
 +
** EPSCI has now taken over responsibility for building ROOT on CUE from Bob Michaels
 +
*** Need some testing procedure to verify builds since it is more important than most software packages
 +
** CentOS8 has very limited support dates
 +
*** We should drop spack support for CentOS8 and replace it with another OS based on what SciComp Ops is thinking
 +
 
 +
* EIC
 +
** Met w/ Dmitry last week to discuss merging of efforts with ANL
 +
*** Nathan looking at clarifying scope of project to convert ANL code from GAUDI to JANA2
 +
*** Discussed need for additional personpower for supporting this effort. Request sent to upper management
 +
** ACTS
 +
*** Nathan working on implementing ACTS examples with JANA2 to learn more about system.
 +
 
 +
* JANA2
 +
** Nathan working on integrating with CLARA as a microservice
 +
*** Some differences with basic data/execution flow between JANA and CLARA that need to be worked out
 +
 
 +
* CLARA
 +
** Issue with occasional (<1%) of files being truncated when processing multiple files
 +
** With Raphaella's help, ran ~100 farm jobs and was able to decipher cause from log files.
 +
** Issue had to do with lost synchronization for one thread and a subsequent thread launched to process next file in list killed thread where original issue developed, masking it.
 +
 
 +
* Data Transport -> EJFAT
 +
** EJFAT = ESnet/JLab + FPGA + Accelerated Transport (pronounced "Edge Fat" = fat data pipe from the edge)
 +
** Meeting today to discuss data format
 +
 
 +
* SRO
 +
** Monday meeting had only a few participants and technical issues prevented lots of discussion
 +
** Some discussion of EVIO format of transient data (Dave A., Carl, T., Vardan G.)
 +
** Another test run by Vardan using software source:
 +
*** 12GB RAM, 15 cores, 2.2GB/s
 +
** Some work with object pools
 +
** David challenged Carl to learn how to reproduce Vardan's performance tests independently
 +
 
 +
* CODA
 +
** Carl continues work on EVIO-6 event viewer GUI
 +
** two minor user requests:
 +
*** More verbose info from user scripts run during transitions
 +
*** Support for setting more environmental variables in COOL
 +
 
 +
* AI
 +
** FOAs + LDRD
 +
*** Many discussions last week. We have potential involvement in several. Primary authorship on 1.
 +
*** Thomas working on LDRD proposal to support work related to Early Career Award
 +
** Jupyterhub
 +
*** Kishan tested running training on GPU via Jupterhub and the epsci-notebook. Some library errors.
 +
**** Communication with Wes led to adding secret libs directory (.singularity.d/libs) to LD_LIBRARY_PATH coupled with CUDA installation in /apps formed working system.
 +
** Experimental Controls
 +
*** Torri met with Noami yesterday who pointer her to some software and gave tour of DB.
 +
*** Able to run plugins over raw data and generate ROOT files. Next step is to examine contents.
 +
 
 +
* Offsite Computing
 +
** GlueX is working on revised XSEDE proposal (Due April, 15th)
 +
** OSG
 +
*** Job queue has been steadily catching up since removing lustre mounts from scosg16
 +
*** Some issues have arisen in the last day that looked to have caused a slow down. They are being investigated.
 +
*** Changes made to monitoring that make it appear as though it is updating faster

Latest revision as of 16:50, 6 April 2021

The meeting time is 10:00am.

Connection Info:

You can connect using BlueJeans Video conferencing (ID: 253 300 597). (Click "Expand" to the right for details -->):

Meeting URL
 https://bluejeans.com/253300597?src=join_info

Meeting ID
253 300 597

Want to dial in from a phone?

Dial one of the following numbers:
+1.888.240.2560 (US Toll Free)
(see all numbers - https://www.bluejeans.com/premium-numbers)

Enter the meeting ID and passcode followed by #

Connecting from a room system?
Dial: bjn.vc or 199.48.152.152 and enter your meeting ID & passcode

Agenda:

  1. Previous meeting
  2. Announcements
  3. Conferences and Workshops
  4. Ongoing Activities
    • Scientific Software support
      • JLab Common Environment (CE) + SPACK
      • EIC
        • Collaboration with ANL
          • Gaudi -> JANA2
        • ACTS
      • Offline frameworks (CLARA, JANA2)
    • Data Transport
      • Meeting with ESnet this afternoon
      • Status of proposal
    • DAQ systems
      • SRO
        • SAMPA + ERSAP + JANA2 + INDRA-ASTRA = April 1st + 2 weeks
      • CODA (CODA3 support, EVIO-6)
    • A.I.
      • Multiple FOAs + JLab LDRD
        • Collaboration with Theory on MCGen project DE-FOA-0002493
        • Collaboration with BNL on AI scheduling DE-FOA-0002482
        • Collaboration with INDRA-ASTRA
        • Collaboration with Sergey F. on AI + FPGA
        • Surrogate Models proposal (NP, ASCR, LDRD?)
        • Amplitude Analysis Inverse Problem (LDRD)
      • Jupyterhub + GPU
      • Experimental Controls
    • Offsite Computing
      • NERSC, PSC, IU
        • XSEDE application for PSC bridges-2 being updated for resubmission (due April 15th)
      • OSG
  5. AOT

Message from Bob Michaels officially handing over ROOT responsibilities to EPSCI (Click "Expand" to the right for details -->):

   BTW, I'm officially passing this job (building and maintaining ROOT) to you, now, David.
   If you need some help, let me know.  Of course, I can answer questions and help resolve
   problems with the old builds.

   yours
   Bob

   Dr. Robert Michaels
   Staff Scientist, Jefferson Lab

Minutes:

Attendees: David L., Carl T., Nathan B., Kishan R., Vardan G., Thomas B., Mike G., Torri J.

  • SPACK
    • Still some work needed for fully functional deployment
    • EPSCI has now taken over responsibility for building ROOT on CUE from Bob Michaels
      • Need some testing procedure to verify builds since it is more important than most software packages
    • CentOS8 has very limited support dates
      • We should drop spack support for CentOS8 and replace it with another OS based on what SciComp Ops is thinking
  • EIC
    • Met w/ Dmitry last week to discuss merging of efforts with ANL
      • Nathan looking at clarifying scope of project to convert ANL code from GAUDI to JANA2
      • Discussed need for additional personpower for supporting this effort. Request sent to upper management
    • ACTS
      • Nathan working on implementing ACTS examples with JANA2 to learn more about system.
  • JANA2
    • Nathan working on integrating with CLARA as a microservice
      • Some differences with basic data/execution flow between JANA and CLARA that need to be worked out
  • CLARA
    • Issue with occasional (<1%) of files being truncated when processing multiple files
    • With Raphaella's help, ran ~100 farm jobs and was able to decipher cause from log files.
    • Issue had to do with lost synchronization for one thread and a subsequent thread launched to process next file in list killed thread where original issue developed, masking it.
  • Data Transport -> EJFAT
    • EJFAT = ESnet/JLab + FPGA + Accelerated Transport (pronounced "Edge Fat" = fat data pipe from the edge)
    • Meeting today to discuss data format
  • SRO
    • Monday meeting had only a few participants and technical issues prevented lots of discussion
    • Some discussion of EVIO format of transient data (Dave A., Carl, T., Vardan G.)
    • Another test run by Vardan using software source:
      • 12GB RAM, 15 cores, 2.2GB/s
    • Some work with object pools
    • David challenged Carl to learn how to reproduce Vardan's performance tests independently
  • CODA
    • Carl continues work on EVIO-6 event viewer GUI
    • two minor user requests:
      • More verbose info from user scripts run during transitions
      • Support for setting more environmental variables in COOL
  • AI
    • FOAs + LDRD
      • Many discussions last week. We have potential involvement in several. Primary authorship on 1.
      • Thomas working on LDRD proposal to support work related to Early Career Award
    • Jupyterhub
      • Kishan tested running training on GPU via Jupterhub and the epsci-notebook. Some library errors.
        • Communication with Wes led to adding secret libs directory (.singularity.d/libs) to LD_LIBRARY_PATH coupled with CUDA installation in /apps formed working system.
    • Experimental Controls
      • Torri met with Noami yesterday who pointer her to some software and gave tour of DB.
      • Able to run plugins over raw data and generate ROOT files. Next step is to examine contents.
  • Offsite Computing
    • GlueX is working on revised XSEDE proposal (Due April, 15th)
    • OSG
      • Job queue has been steadily catching up since removing lustre mounts from scosg16
      • Some issues have arisen in the last day that looked to have caused a slow down. They are being investigated.
      • Changes made to monitoring that make it appear as though it is updating faster