System Updates Archive

  • Grace Cluster and GPFS Filesystem Back Online

    Friday, May 8, 2015 - 5:30pm

    Maintenance to the Grace cluster and GPFS Filesystem is now complete.  The cluster is back online and available for access.  We apologize for any inconvenience resulting from the delay.  

    If you have any questions,  concerns or comments, please don’t hesitate to contact us at hpc@yale.edu.

  • CONTINUED DELAY: Grace Cluster and GPFS Filesytem

    Friday, May 8, 2015 - 9:30am

    The Grace Cluster and GPFS Filesystem coming back online has been delayed.  The cluster has been experiencing network problems and so far the issue has been narrowed to the storage network.   There is no impact to the underlying filesystem or storage of data.

    The team will continue to addess the issue over the weekend and will continue to post updates to the website as more information is available.  At this time we are not expecting resolution before Monday.

    If you have any questions,  concerns or comments, please don’t hesitate to contact us at hpc@yale.edu

  • Power Outage at West Campus

    Thursday, May 7, 2015 - 1:00pm

    A brief power outage at West Campus resulted in bringing many Omega Cluster nodes offline.  The cluster has been restored and is currently running however jobs may have been impacted.  If you are currently using the Omega cluster we are advising you to check your jobs to determine if they need to be restarted.

    If you have any questions,  concerns or comments, please don’t hesitate to contact us at hpc@yale.edu.

  • CONTINUED DELAY: Grace Cluster and GPFS Filesytem

    Thursday, May 7, 2015 - 4:00am

    The Grace Cluster and GPFS Filesystem coming back online has been delayed.  Currently there is no estimated time for resolution but the team is working hard to complete the maintenance and updates will be posted here.

  • Notice: Grace cluster maintenance - currently ongoing

    Maintenance is currently ongoing for the Grace cluster and GPFS filesystem as described below.  
     

    First Maintenance Window:     5/4/2015 – 5/6/2015

    We will be bringing the Grace cluster offline Monday May 4th starting at 8 am through Wednesday May 6th along with the GPFS filesystem.  Users of the Bulldog N and Louise clusters will be impacted if they are currently using /GPFS, otherwise no impact should be expected.  All systems should be back online the morning of May 7th.

    UPDATE:  Cluster coming back online delayed until 5:30 pm 5/8.

    Second Maintenance Window:            5/20/2015 – 5/22/2015

    The second maintenance window for the Grace cluster is scheduled for Wednesday May 20th through Friday May 22nd.  Grace will be back online Saturday May 23rd.  Users of other clusters will not be impacted unless using /gpfs filesystem, which may be unavailable at times during the maintenance window.  Monitoring will continue over the holiday weekend to ensure the system is stable.

    If you have any questions,  concerns or comments, please don’t hesitate to contact us at HPC@Yale.edu.
     

  • Power Outage at West Campus

    Monday, March 23, 2015 - 8:00pm to 11:45pm

    Update: Service to Grace and Omega was restored ~11:45am 23-Mar-2015. 

    An unscheduled power outage at West Campus has affected the Grace and Omega clusters. 

  • Omega and Grace Unscheduled Outage

    Monday, March 16, 2015 - 5:22pm to 8:45pm

    Update: Omega and Grace both fully restored by 8:45PM 16-Mar-2015.

     

    Update: The storage on Omega began experiencing intermittent outages starting at 5:22PM 16-Mar-2015. The issues were resolved on Omega by 7:20PM 16-Mar-2015.

     

    Omega and Grace are currently unavailable. The HPC staff are currently working to restore the services.

  • Globus Scheduled Downtime

    Thursday, March 12, 2015 - 12:00am

    The Globus file transfer service will be unavailable on Saturday, 21-Mar-2015, between 11:00AM and 3:00PM EST for upgrades. Active file transfers will be suspended during this time and will resume when the service is restored.

  • Scheduled Maintenance for Omega/HEP

    Monday, February 2, 2015 - 12:00am

    The Omega/HEP clusters will be unavailable from Mon 16-Feb-2015 at 8:00AM thru Wed 18-Feb-2015 at 5:00PM EST for scheduled maintenance. Work includes the following:

    • Infiniband network upgrades
    • glibc GHOST network security patch for Linux
    • storage hardware maintenance
    • cluster storage performance tuning
    • new “phitest” job queue created for testing Intel Xeon Phi accelerator hardware
    • access to Grace cluster storage from Omega
    • remote file transfer for Grace cluster storage is now available via globus.org
    • remote desktop pilot kicked off
  • Grace File System Interruption

    Thursday, January 15, 2015 - 11:20am to 11:35am

    The Grace file system suffered a brief outage at ~11:20am. The file system is back on-line.  You are encouraged to check your jobs, as some may have been lost due to the interruption.

Pages