EPSCI Group Meeting Sep. 14, 2020

From epsciwiki
Jump to navigation Jump to search

The meeting time is 10:00am.

Connection Info:

You can connect using BlueJeans Video conferencing (ID: 253 300 597). (Click "Expand" to the right for details -->):

Meeting URL
 https://bluejeans.com/253300597?src=join_info

Meeting ID
253 300 597

Want to dial in from a phone?

Dial one of the following numbers:
+1.888.240.2560 (US Toll Free)
(see all numbers - https://www.bluejeans.com/premium-numbers)

Enter the meeting ID and passcode followed by #

Connecting from a room system?
Dial: bjn.vc or 199.48.152.152 and enter your meeting ID & passcode

Agenda:

  1. Previous meeting
  2. Announcements
  3. Graham's Project
  4. Ongoing Activities
    • JANA2
      • GlueX port
      • A.I. support
    • A.I.
    • EVIO-6
    • SRO
      • ERSAP
      • Hall-B/D TriDAS
    • Offsite Computing
      • NERSC/PSC GlueX meeting Tuesday
      • OSG
    • JLab Common Environment (CE) + SPACK
  5. GUI for Calorimeter calibration scripts (Hall-D Request)
  6. Publications
  7. AOT



Minutes:

Attendees: David L., Carl T., Nathan B., Thomas B., Vardan G., Kishan R., Graham H.

Announcements

  • Some issues with GPU assignments on sciml nodes. Also related to "Folding at Home" jobs.

JANA2

  • GlueX Port
    • ~2400 files and ~0.5M lines of code
    • Successfully ported ~214 files (JObjects)
    • Some issues with HDGeometry library
      • Dependency with hdds
    • JResourceManager has not been implemented in JANA2. Nathan now understands what it does and has a plan on how to implement it there.

A.I.

  • Kishan still finalizing system for saving models from Keras and loading from C++
  • Some issues with casting Tensors based on floats to ones based on double and vice versa
  • Hydra
    • (see announcements regarding sciml nodes)
    • Using Thomas' desktop at jlab for now which has one RTC Titan card
    • Training CDC model took ~24hr and used the equivalent of 26k x 3 images
      • Using label weights to correct for natural imbalance between images with "good" labels vs. "bad"
      • Current training includes all "good" images, but some could be dropped to speed up training.
      • Recent test used 1.5k "bad" images David generated to mimic single HV board failures. 60% used in training
      • Training included random sample of "bad" images. May need to retrain where images of certain failure modes are excluded so test can be made for generalization.


EVIO-6

  • Issue identified when reading from file
    • Simultaneous support of HIPPO and EVIO has ambiguity in determining if EVIO is format being read (no place in event data to record definitively if it is EVIO)
    • Plan has been formed on how to address that will properly determine format in most all cases. Will rely on catching exception for rare cases where it gets it wrong.

CLAS12

  • CODA/jcedit
    • Dave A. reported a bug to Vardan that jcedit does not save properly formatted COOL config. when making connection between ROC and a non-EB component.
    • Having this feature would be useful for non-standard test systems like sending data to "none" or some "debug" process
    • The issue has been fixed by Vardan
  • CLAS12 Reconstruction
    • JDK-11 was tested some time ago and seen to improve performance considerably in CLAS12 reconstruction
    • Most notably the Tracking code improved nearly 20% in single threaded operation over JDK-8
    • Recently, some discrepancies were noted in tracking results correlating with the change to JDK-11
    • Problem traced to matrix inversions done with a 3rd party package (JAMA) that was adopted years ago, but has apparently not been actively developed since 2012.
    • Vardan supplied a working example of an alternative package that can be used to replace it.
    • Vardan noted that locating the source of this bug was done relatively quickly due to the micro-services architecture CLAS12 is built on.

SRO

  • ERCAP Data Lake testing
    • In addition to redis, testing has now been done using a ring buffer system provided by Carl and implemented by Vardan.
    • Test were done using the CLAS12 Forward Tagger with beam at a rate of 95.5MB/s with a frame drop rate of ~<1%
    • Possibility to couple with CLAS12 recon. software to produce pi0 peak. Will require conversion to HIPPO though which is currently done offline using a dedicated, single threaded program (not CLARA-based). Not clear how long it work take to convert that to a service so processing chain could be fully implemented online.

SPACK

  • Wouter D. gave presentation to Ops group in CST last week.
  • Presentation gave overview of SPACK including strengths and weaknesses
  • Thomas was at meeting, but will reach out to Wouter for his slides.

OSG

  • No OSG jobs run at JLab yet as far as Thomas knows.
  • Some minor details need to be finished by various people. No strict and immediate deadlines so work lingers a bit.