Computing and file services

From CebafUsersGroup
Revision as of 15:27, 28 September 2012 by Kuhn (Talk | contribs) (Created page with 'Jefferson Lab Scientific Computing Group will introduce a significant upgrade to the existing mass storage and batch processing systems in early November this year. Beginning i…')

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Jefferson Lab Scientific Computing Group will introduce a significant upgrade to the existing mass storage and batch processing systems in early November this year. Beginning in early October, users will notice warnings and informational messages about the pending upgrade when they execute existing commands. The Scientific Computing group has invested a lot of development effort to keep syntax and behavior of user commands in the upgraded system similar to those of existing commands even though underlying systems have been changed greatly. The following list describes the noteworthy changes in the upgraded system:

1. User certificate system.

The current authentication system uses self-signed certificates with exceptionally long expiration dates. In accordance with industry-standard practices, and to improve the security of our system, new user certificates must be signed by the Jefferson Lab central signing authority, and will be valid for two years.

All users will be required to replace their certificates by running a command line utility on any centrally managed Linux machine such as jlabl1. The general availability of this tool will be sent out initially through email and web postings, followed later by warning messages from existing commands.

2. New disk cache and volatile management systems.

The new cache and volatile systems are based on a Lustre global file system, which provides better performance and better scalability. The volatile system is already in production serving as a global scratch disk area with enabled quota and reservation capabilities. The cache system will serve as a front end to the tape library. The new cache system offers users the ability to explicitly pin cached files. Moreover, each file in a DST storage group will be assigned a relatively long pin time. Users can modify the pins, obviating the current practice of manually managing DST cache files.

3. Upgraded Jasmine system.

The upgraded back-end system is actually already in production. Some notable improvements include: use of MD5 checksums to better ensure data integrity, opportunistic direct transfer between network file-systems and the tape library to improve throughput and reduce network traffic, distributed load-balancing and prioritization to provide fair access to the tape library, automatic background duplication of all raw data to avoid catastrophic data loss, automated error recovery to improve tape library availability, and improved auditing of the systems to provide better reporting capabilities.

The user tools have been re-written to deliver more useful information during operation as well as on system error. Please note that the output of jput and jget will change significantly. Consult the Scientific Computing wiki if you have scripts that depend upon the output of these commands.

4. Upgraded Auger system.

The Auger system has been re-engineered to deliver better utilization of the farm nodes, cache disks and the tape library system. All Auger commands retain the current syntax. However, the output of a new Auger command will be in XML format.

5. Scientific Computing Web Portal.

A new web portal will be deployed in the same time frame. It will utilize modern web technologies such as AJAX and JQuery. It will offer the user a better web experience, and will feature more data mining capabilities.

This upgrade is aiming to provide users a better experience using Jefferson Lab scientific computing facilities in addition to deliver the best utilization possible of the facilities. The underlying software has been either re-implemented or completely redesigned using current best-practice software techniques to improve reliability, performance and maintainability of our whole software system.

More information about this upgrade can be found in the wiki page under the entries of "Upcoming Software Upgrade". If you have any question or suggestion for new features, please feel free to send email to