EPSCI Group Meeting Sep. 28, 2020

From epsciwiki
Jump to navigation Jump to search

The meeting time is 10:00am.

Connection Info:

You can connect using BlueJeans Video conferencing (ID: 253 300 597). (Click "Expand" to the right for details -->):

Meeting URL
 https://bluejeans.com/253300597?src=join_info

Meeting ID
253 300 597

Want to dial in from a phone?

Dial one of the following numbers:
+1.888.240.2560 (US Toll Free)
(see all numbers - https://www.bluejeans.com/premium-numbers)

Enter the meeting ID and passcode followed by #

Connecting from a room system?
Dial: bjn.vc or 199.48.152.152 and enter your meeting ID & passcode

Agenda:

  1. Previous meeting
  2. Announcements
  3. Graham's Project
  4. Coding Standards
  5. Ongoing Activities
    • Offsite Computing
      • Integration of IU BigRed into GlueX Production chain starting next week
      • NERSC
        • AY21 Request to be submitted this week
        • Job bundling to start mid-late Oct.
      • OSG
    • EVIO-6
    • JANA2
      • GlueX port
    • A.I.
    • SRO
      • ERSAP
      • Hall-B/D TriDAS
    • JLab Common Environment (CE) + SPACK
  6. GUI for Calorimeter calibration scripts (Hall-D Request)
  7. Publications
  8. AOT



Minutes:

Attendees: David L., Carl T., Nathan B., Thomas B., Vardan G., Kishan R., Graham H.

Announcments

  • Word on the street is that the issue with GPU assignments on the sciml cluster machines is now (probably) resolved. At least the most egregious one.
  • Fortnight paper/code review meetings are held on Teams. At some point an extra Bluejeans meeting was scheduled. The Bluejeans was canceled and David will send out an update to the Teams meeting to ensure it is on everyone's calendar.

CLAS12

  • Some rare exceptions being thrown in CLAS12 recon software have been addressed. There was some confusion earlier on whether there was a bug in CLARA or on the CLAS12 software side. Vardan tracked it down to an improperly handled exception in one (or more) of the CLAS12 engines. Veronique put in a fix and Vardan ran tests over the weekend that have not shown the error to still be present. (Rare errors can be difficult to prove are fixed for certain.)
  • Issues that slowed the transition from JDK-8 to JDK-11, or now JDK-14 have been resolved. The problem was due to discrepancies between code compiled with JDK-8 and JDK-11 in the matrix math package. Relaxing the precision requirement on the matrix inversion brought the two into alignment and they can now move forward.

Offsite Computing

  • NERSC
    • NERSC allocation requests due Oct. 5th. The GlueX request from last year will be renewed.
    • It was noted that CLAS12 had an allocation the year before which was donated to GlueX. There is a potential long term plan that could specify the allocation request to encompass all JLab experiments in the future. David will need to ask the DOE NP program manager (George Fai) about this.
    • Work to create bundled jobs for NERSC will take place in the last half of October.
  • IU Big Red
    • David will begin work next week with Chris L. to integrate the IU Big Red system into the swif2 system so it can be used for GlueX production.

OSG

  • Head nodes are set up and instructions passed to relevant parties
  • Four farm nodes have been temporarily dedicated to OSG development
  • MCWrapper
    • New crop of students that are documentation-phobic
    • Several issues brought up at last GlueX Software meeting. Thomas has addressed these in MCWrapper release 2.5.1

JANA2

  • GlueX port chugging along. 24/28 library modules now compile.
  • Potential issue with the existence of multiple locks (originally only expected one ROOT lock).
  • Need to start mentioning and reporting at GlueX software meetings

AI

  • Official JLab webpage needs content maintenance.
    • Kishan has started looking at this and is putting together a proposal that will be ready sometime this week.
    • Need to sign Kishan up for Drupal Training
  • Hydra
    • multi-gpu model compilation is working, but not all GPUs seem to be utilized despite being assigned to different processes. Kishan is looking into.
    • Downsampling has now been shown to work successfully in training the Hydra CDC model. This reduces the training time from ~23hr to ~3hr with now reduction in accuracy.

SRO

  • ERCAP
    • ~300GB of Hall-B FT beam data collected with prototype ERCAP system
    • required only ~1.5 cores with <1% loss of time slices and very low memory consumption
  • Vardan met with Dave A. and Ben R. last week to discuss Indra setup. They plan to complete the setup this week to use as a testbed. We are expected to be the primary customer.
  • Vardan is preparing a contribution to RT2020 that will focus on ERCAP design/testing
  • Hall-B/D Tridas tests
    • Successful beam test to read out Hall-B FT under various threshold and trigger conditions
    • First successful test of CODA + TriDAS + JANA2 with event filtering provided by JANA2
    • Some memory leaks observed so individual runs limited to ~10-20min. Early reports indicated this only happened when JANA was included, but there was less certainty in this diagnoses by the TriDAS group at the meeting last week.

SPACK

  • Hope to start ramping back up work on this this week.

Sophia is Born!

  • Saturday saw the arrival of the newest addition to the Gyurjyan family with the birth of their granddaughter Sophia
  • Family = home and healthy
  • Pictures = adorable