System Updates Archive

  • Omega Scheduled Downtime

    Monday, June 23, 2014 - 4:26pm

    The Omega cluster will be brought down for maintenance from 07-Jul-2014 through 09-Jul-2014. Work will be performed on the Lustre filesystems as well as on the InfiniBand storage network. Faulty storage components will be replaced and InfiniBand switch firmware will be upgraded. Access will be closed starting Monday morning July 7th. Please contact hpc@yale.edu with any questions or concerns.

  • Quota scripts on Grace

    Wednesday, August 20, 2014 - 11:17am

    Use the following scripts to check your disk usage on Grace:

    /usr/local/bin/myquota.sh
    /usr/local/bin/groupquota.sh
    
  • Scheduled Downtime for FileTek StorHouse

    Wednesday, August 27, 2014 - 1:29pm

    The near-line storage system FileTek StorHouse will be unavailable from 04-Sep-2014 thru 05-Sep-2014 due to a backend storage upgrade. More details to follow.

  • Globus File Transfer Extended to Louise and HEP

    Tuesday, September 2, 2014 - 2:25pm

    The Globus File Transfer tool has been extended to include the Louise and HEP clusters.

  • Grace Scheduled Downtime Sep 2014

    Wednesday, September 24, 2014 - 4:07pm to Friday, September 26, 2014 - 8:41pm

    The HPC Team is planning a maintenance window on Grace from 24-Sep-2014 through 26-Sep-2014. This does not affect Omega. We will be making changes to allow for future expansion of the filesystem and the ability to export gpfs to other networks. The maintenance will begin at 8:00AM Wednesday 24-Sep-2014. All jobs will be terminated and Grace will be unavailable until Friday 26-Sep-2014. Please plan accordingly.

  • Louise Scheduled Downtime Oct 2014

    Monday, October 6, 2014 - 12:00am to Wednesday, October 8, 2014 - 12:00am

    The HPC Team will be performing maintenance on the Louise cluster. The downtime will take place Monday 06-Oct-2014 through Wednesday 08-Oct-2014. All running jobs on Louise will be terminated and files will not be accessible during the downtime. Please plan accordingly.

    Update: Scheduled downtime is completed. The following work was performed:

    • cabling and network addressing to facilitate Hitachi -> Cloud Data Migrator
    • Hitachi SMU upgraded
    • Hitachi Hi-Track monitor software upgraded
    • Hitachi BlueArc heads upgraded
    • firmware updates applied to Hitachi HUS-130’s and HUS-150
    • new linux kernel applied to compute nodes and support servers
    • Torque and Moab updates applied
    • Dell firmware and out-of-band management updates applied
    • storage benchmarks performed before and after work was performed
    • scratch data consolidated to single filesystem
    • cluster confidence benchmarks performed

    Benchmarks

  • Reports

    Wednesday, October 1, 2014 - 2:15pm

    On 01-Oct-2014, faculty and PI’s will begin receiving monthly reports containing storage and queue usage. A sample report can be viewed here.

  • Grace Cluster Nodes Rebooted

    Monday, June 9, 2014 - 2:38pm

    The compute nodes on the Grace cluster were inadvertantly rebooted at approximately 11:50AM on 09-Jun-2014. The nodes came up ok but you will need to verify your jobs. We apologize for any inconvenience.

  • Power Outage at West Campus

    Monday, August 18, 2014 - 9:24am

    An unscheduled power outage occurred at ~9:15AM 18-Aug-2014 at West Campus. The A21 data center was affected and clusters Grace and Omega are currently unavailable. The compute nodes on Bulldog-N rebooted but the cluster appears to be avilable. We are in the process of restoring services to Grace/Omega.

    Update: Grace/Omega/BDN are available for use.

  • Power restored October 9th at 300 George St.

    Thursday, October 9, 2014 - 9:06pm

    Update: Services have been completely restored. The storage and head nodes on Louise did not experience a power failure, however, network connectivity to the head nodes was affected. 

    We are experiencing a power outage at our data center on 300 George St. This may affect the Louise cluster. The outage began ~6:40pm on 09-Oct-2014. We do not yet know the extent of the outage. More info to follow.

Pages