System Updates Archive
-
Grace Cluster and GPFS Filesystem Back Online
Friday, May 8, 2015 - 5:30pmMaintenance to the Grace cluster and GPFS Filesystem is now complete. The cluster is back online and available for access. We apologize for any inconvenience resulting from the delay.
If you have any questions, concerns or comments, please don’t hesitate to contact us at hpc@yale.edu.
-
CONTINUED DELAY: Grace Cluster and GPFS Filesytem
Friday, May 8, 2015 - 9:30amThe Grace Cluster and GPFS Filesystem coming back online has been delayed. The cluster has been experiencing network problems and so far the issue has been narrowed to the storage network. There is no impact to the underlying filesystem or storage of data.
The team will continue to addess the issue over the weekend and will continue to post updates to the website as more information is available. At this time we are not expecting resolution before Monday.
If you have any questions, concerns or comments, please don’t hesitate to contact us at hpc@yale.edu
-
Power Outage at West Campus
Thursday, May 7, 2015 - 1:00pmA brief power outage at West Campus resulted in bringing many Omega Cluster nodes offline. The cluster has been restored and is currently running however jobs may have been impacted. If you are currently using the Omega cluster we are advising you to check your jobs to determine if they need to be restarted.
If you have any questions, concerns or comments, please don’t hesitate to contact us at hpc@yale.edu.
-
CONTINUED DELAY: Grace Cluster and GPFS Filesytem
Thursday, May 7, 2015 - 4:00amThe Grace Cluster and GPFS Filesystem coming back online has been delayed. Currently there is no estimated time for resolution but the team is working hard to complete the maintenance and updates will be posted here.
-
Notice: Grace cluster maintenance - currently ongoing
Maintenance is currently ongoing for the Grace cluster and GPFS filesystem as described below.
First Maintenance Window: 5/4/2015 – 5/6/2015
We will be bringing the Grace cluster offline Monday May 4th starting at 8 am through Wednesday May 6th along with the GPFS filesystem. Users of the Bulldog N and Louise clusters will be impacted if they are currently using /GPFS, otherwise no impact should be expected. All systems should be back online the morning of May 7th.
UPDATE: Cluster coming back online delayed until 5:30 pm 5/8.
Second Maintenance Window: 5/20/2015 – 5/22/2015
The second maintenance window for the Grace cluster is scheduled for Wednesday May 20th through Friday May 22nd. Grace will be back online Saturday May 23rd. Users of other clusters will not be impacted unless using /gpfs filesystem, which may be unavailable at times during the maintenance window. Monitoring will continue over the holiday weekend to ensure the system is stable.
If you have any questions, concerns or comments, please don’t hesitate to contact us at HPC@Yale.edu.
-
Power Outage at West Campus
Monday, March 23, 2015 - 8:00pm to 11:45pmUpdate: Service to Grace and Omega was restored ~11:45am 23-Mar-2015.
An unscheduled power outage at West Campus has affected the Grace and Omega clusters.
-
Omega and Grace Unscheduled Outage
Monday, March 16, 2015 - 5:22pm to 8:45pmUpdate: Omega and Grace both fully restored by 8:45PM 16-Mar-2015.
Update: The storage on Omega began experiencing intermittent outages starting at 5:22PM 16-Mar-2015. The issues were resolved on Omega by 7:20PM 16-Mar-2015.
Omega and Grace are currently unavailable. The HPC staff are currently working to restore the services.
-
Globus Scheduled Downtime
Thursday, March 12, 2015 - 12:00amThe Globus file transfer service will be unavailable on Saturday, 21-Mar-2015, between 11:00AM and 3:00PM EST for upgrades. Active file transfers will be suspended during this time and will resume when the service is restored.
-
Scheduled Maintenance for Omega/HEP
Monday, February 2, 2015 - 12:00amThe Omega/HEP clusters will be unavailable from Mon 16-Feb-2015 at 8:00AM thru Wed 18-Feb-2015 at 5:00PM EST for scheduled maintenance. Work includes the following:
- Infiniband network upgrades
- glibc GHOST network security patch for Linux
- storage hardware maintenance
- cluster storage performance tuning
- new “phitest” job queue created for testing Intel Xeon Phi accelerator hardware
- access to Grace cluster storage from Omega
- remote file transfer for Grace cluster storage is now available via globus.org
- remote desktop pilot kicked off
-
Grace File System Interruption
Thursday, January 15, 2015 - 11:20am to 11:35amThe Grace file system suffered a brief outage at ~11:20am. The file system is back on-line. You are encouraged to check your jobs, as some may have been lost due to the interruption.