Difference between revisions of "EPSCI Group Meeting Mar. 15, 2021"

From epsciwiki
Jump to navigation Jump to search
 
Line 87: Line 87:
 
=== Minutes: ===
 
=== Minutes: ===
  
<!-- Attendees: David L., Carl T., Nathan B., Kishan R., Vardan G., Thomas B., Graham H. -->
+
Attendees: David L., Carl T., Nathan B., Kishan R., Vardan G., Thomas B., Mike G.
 +
 
 +
* Announcements
 +
** Safety Meeting Last week
 +
*** Criteria for reopening depends on covid infection rate for primary municipalities that JLab employees reside in
 +
*** Unclear what telework policy will be after reopening.
 +
* CST Division meeting today at 3pm
 +
 
 +
* SPACK
 +
** Plan to roll out April 1st
 +
** Homework assigned to all EPSCI members to test quickstart instructions and report any issues
 +
 
 +
* ACTS
 +
** Big push last week to get tracking working. Still some issues with everything working completely.
 +
** Some agreement on merging efforts with ANL was reached over weekend. This includes JANA2 as part of the software stack.
 +
 
 +
* JANA2
 +
** Pull request merged that improved support for Python API
 +
 
 +
* CLARA
 +
** Anticipating release of JDK 16 which should have several significant imrpovements
 +
 
 +
* Data Transport
 +
** Met with Fast electronics Group last week to discuss project
 +
** Meeting scheduled for Wed. this week with both JLab FEE group and ESnet people
 +
** Working to reproduce some tests William Gu did with FPGA cards in INDRA lab
 +
 
 +
* SRO
 +
** New VTP firmware release on Friday to address some dropped frame issues
 +
** Vardan ran test overnight Saturday and sent results to Dave A.
 +
*** Things looking better. No more 10% spikes in dropped frame rate
 +
**** Remaining (smaller) spikes are still mysterious since they are in sync with the two nominally asynchronous streams.
 +
** ERSAP VTP C++ code compiled by Carl on indras1 and is now available for testing
 +
 
 +
* CODA
 +
** Carl has been working with Dave A. to get it to compile with jana JDK 15
 +
** Working to "spackify" CODA
 +
** Some libraries used by CODA components (most notably jcedit) have been abandoned in the newest Java JDK
 +
*** These are now being treated as 3rd party libraries
 +
*** Some failures to get everything to compile on RHEL6 which is still used on some test machines in DAQ lab
 +
**** Vardan will spend a little more time on it but if things don't come together in a reasonable amount of time he will suggest making CODA 3.11 be "RHEL7 and above"
 +
 
 +
* AI
 +
** GPU nodes
 +
*** Plan modified due to vendor option for GPU nodes being cost ineffective for single GPU nodes
 +
*** New plan is to buy a couple of GPU loaded nodes and use SLURM to allocate jobs with single GPU
 +
*** David expressed some concern over getting the right CPU/GPU balance to match the jobs
 +
*** Bryan Hess scheduled meeting for tomorrow to discuss.
 +
** Jupyterhub
 +
*** Looks like the documentation on custom kernels may need to be updated
 +
*** Some issues remain with getting Tensorflow to recognize GPU in epsci-notebook
 +
*** Service Now request in and Wes is working on it. David will reach out to check status.
 +
** AI Experimental Controls
 +
*** Torri Jeske has accepted the Postdoc position and will start April 1st
 +
*** Thomas has continued looking into project in order to form roadmap
 +
 
 +
* OSG
 +
** GlueX has been using scosg16 node for some time
 +
** Issue last week with node failing due to overload from too many jobs after large submit from CLAS
 +
** A cap was put in place by CLAS while back in response to a similar issue but apparently was not active for "test" jobs.
 +
** More administrative controls put in place and there is some exploration being done for more engineered controls at the scheduler level to prevent DOS type meltdowns in such circumstances.

Latest revision as of 16:02, 15 March 2021

The meeting time is 10:00am.

Connection Info:

You can connect using BlueJeans Video conferencing (ID: 253 300 597). (Click "Expand" to the right for details -->):

Meeting URL
 https://bluejeans.com/253300597?src=join_info

Meeting ID
253 300 597

Want to dial in from a phone?

Dial one of the following numbers:
+1.888.240.2560 (US Toll Free)
(see all numbers - https://www.bluejeans.com/premium-numbers)

Enter the meeting ID and passcode followed by #

Connecting from a room system?
Dial: bjn.vc or 199.48.152.152 and enter your meeting ID & passcode

Agenda:

  1. Previous meeting
  2. Announcements
  3. Conferences and Workshops
  4. Ongoing Activities
    • Data Transport
    • DAQ systems
      • SRO
        • SAMPA + ERSAP + JANA2 + INDRA-ASTRA = April 1st
      • CODA (CODA3 support, EVIO-6)
    • Experiment Support
      • EIC (EIC Software Expression of Interest meeting 1/27)
      • CPP
      • SOLID
    • Offsite Computing
      • NERSC, PSC, IU
      • OSG
  5. AOT

Minutes:

Attendees: David L., Carl T., Nathan B., Kishan R., Vardan G., Thomas B., Mike G.

  • Announcements
    • Safety Meeting Last week
      • Criteria for reopening depends on covid infection rate for primary municipalities that JLab employees reside in
      • Unclear what telework policy will be after reopening.
  • CST Division meeting today at 3pm
  • SPACK
    • Plan to roll out April 1st
    • Homework assigned to all EPSCI members to test quickstart instructions and report any issues
  • ACTS
    • Big push last week to get tracking working. Still some issues with everything working completely.
    • Some agreement on merging efforts with ANL was reached over weekend. This includes JANA2 as part of the software stack.
  • JANA2
    • Pull request merged that improved support for Python API
  • CLARA
    • Anticipating release of JDK 16 which should have several significant imrpovements
  • Data Transport
    • Met with Fast electronics Group last week to discuss project
    • Meeting scheduled for Wed. this week with both JLab FEE group and ESnet people
    • Working to reproduce some tests William Gu did with FPGA cards in INDRA lab
  • SRO
    • New VTP firmware release on Friday to address some dropped frame issues
    • Vardan ran test overnight Saturday and sent results to Dave A.
      • Things looking better. No more 10% spikes in dropped frame rate
        • Remaining (smaller) spikes are still mysterious since they are in sync with the two nominally asynchronous streams.
    • ERSAP VTP C++ code compiled by Carl on indras1 and is now available for testing
  • CODA
    • Carl has been working with Dave A. to get it to compile with jana JDK 15
    • Working to "spackify" CODA
    • Some libraries used by CODA components (most notably jcedit) have been abandoned in the newest Java JDK
      • These are now being treated as 3rd party libraries
      • Some failures to get everything to compile on RHEL6 which is still used on some test machines in DAQ lab
        • Vardan will spend a little more time on it but if things don't come together in a reasonable amount of time he will suggest making CODA 3.11 be "RHEL7 and above"
  • AI
    • GPU nodes
      • Plan modified due to vendor option for GPU nodes being cost ineffective for single GPU nodes
      • New plan is to buy a couple of GPU loaded nodes and use SLURM to allocate jobs with single GPU
      • David expressed some concern over getting the right CPU/GPU balance to match the jobs
      • Bryan Hess scheduled meeting for tomorrow to discuss.
    • Jupyterhub
      • Looks like the documentation on custom kernels may need to be updated
      • Some issues remain with getting Tensorflow to recognize GPU in epsci-notebook
      • Service Now request in and Wes is working on it. David will reach out to check status.
    • AI Experimental Controls
      • Torri Jeske has accepted the Postdoc position and will start April 1st
      • Thomas has continued looking into project in order to form roadmap
  • OSG
    • GlueX has been using scosg16 node for some time
    • Issue last week with node failing due to overload from too many jobs after large submit from CLAS
    • A cap was put in place by CLAS while back in response to a similar issue but apparently was not active for "test" jobs.
    • More administrative controls put in place and there is some exploration being done for more engineered controls at the scheduler level to prevent DOS type meltdowns in such circumstances.