System Updates Archive
-
Omega Network Issues - Update
Monday, January 11, 2016 - 11:00amAt 3 pm Thursday, January 7, 2016, Omega experienced network issues. At this time, queuing is enabled and we believe the issues have been resolved. If you experience any problems, please email hpc@yale.edu.
We apologize for any inconvenience this may have caused.
-
BulldogN Maintenance
Wednesday, November 18, 2015 - 12:00pmAdditional maintenance to BulldogN's storage appliances is ongoing. All systems will be unavailable and logins will be disabled during the maintenance period. Any job running during this time will be terminated and storage will be unavailable. BulldogN is planned to be available again Thursday, November 19, in the afternoon.
Important:
- All file systems must remain under 90% utilization prior to the upgrade. Please remove any unnecessary files on /ycga-ba and refrain from adding to your net usage.
- If the file systems are above 90%, we will be required to move files to other locations.
Your cooperation prior to the maintenance period is greatly appreciated.Status updates will be posted here. If you have any questions, concerns or comments, please don't hesitate to contact us at hpc@yale.edu. -
Power Outage - Grace / Omega
Sunday, December 27, 2015 - 10:54amJust after midnight on December 27, 2015, a power outage occurred that affected the Omega and Grace clusters. Services were fully restored at 9:30am the same morning. Users are encouraged to check any of their jobs running at that time and report any issues to hpc@yale.edu.
-
BulldogN Maintenance
Monday, October 26, 2015 - 8:00amAn upgrade to storage appliances will be performed starting the morning of Monday, October 26. This upgrade will be performed outside BulldogN’s scheduled maintenance interval. All systems will be unavailable and logins will be disabled during the maintenance. Any job running during this time will be terminated. All systems are planned to be available again Tuesday, October 27, in the afternoon.
Status updates will be posted to this page. If you have any questions, concerns or comments, please don’t hesitate to contact us at hpc@yale.edu.
-
Scheduled Maintenance Window for Louise
Monday, October 12, 2015 - 8:00amStarting the morning of Oct 12th we will have a maintenance window for the Louise cluster. All systems are planned to be available again on the morning of October 15th. During the maintenance period, logins will be disabled.
If you have any questions, concerns or comments, please don’t hesitate to contact us at hpc@yale.edu.
Yale Center for Research Computing
-
Omega Network Issue - Update
Monday, July 27, 2015 - 5:00pmWe are currently experiencing intermittent networking issues on Omega. In order to fix this issue, we will be restarting our network switches, effectively bringing all nodes offline and disabling logins We are recommending that any jobs currently running be checked once Omega is ready for service.
We sincerely apologize for this inconvenience, especially in light of recent events.
Please refer to this page for the latest updates and email hpc@yale.edu with any questions, concerns or comments.
-
Omega Network Issue
Thursday, September 17, 2015 - 1:30pmWe are currently experiencing intermittent networking issues on Omega. Roughly 500 nodes are offline and many end users are unable to login. The HPC team is investigating and working hard to bring the system back to regular service as soon as possible. Currently, logins are enabled, however may be disabled as we trouble shoot and fix the issue.
We sincerely apologize for this inconvenience, especially in light of recent events.
Please refer to this for the latest updates and email hpc@yale.edu with any questions, concerns or comments.
Yale Center for Research Computing
-
Omega is Back in Service
Tuesday, September 15, 2015 - 3:00pmOmega is now back in service. Files have been restored to /home from tape backup and a significant number of files (under 1 MB in size) have been recovered from /scratch. For instructions on discovering which files have been impacted by corruption, and guidance regarding the potential for recovery of additional files that may be of particular importance, please enter the following command after logging into Omega:
cat /scratch/maint-notes-sept-2015/README
We are keenly aware of the impact that this disruption may have on your research, and we sincerely apologize for the inconvenience. We will provide all possible assistance as you recover from this unfortunate event. Please email hpc@yale.edu to request help, or with any questions, concerns or comments.
-
Omega Update
Monday, September 14, 2015 - 10:30pmAs of today, September 14, the HPC team has completed restoration of files to /home from tape backup and has recovered a significant number of files (under 1 MB in size) from /scratch. The team has successfully run confidence tests and benchmarks on both the storage system and the InfiniBand network, with entirely satisfactory results. However, we are continuing to seek additional assurances from the Lustre file system development team and our vendors to confirm that we have taken all reasonable steps to ensure that the storage system is in the best possible condition. At this point, unless the Lustre developers or our vendors inform us of additional recommended actions, we plan to return Omega to service by tomorrow afternoon. At that time, we will provide users with both detailed information about files affected by the storage corruption, and guidance regarding the potential for recovery of additional files that may be of particular importance.
We are keenly aware of the impact that this disruption may have on your research, and we sincerely apologize for the inconvenience. We will provide all possible assistance as you recover from this unfortunate event. Please email hpc@yale.edu to request help, or with any questions, concerns or comments.
Please refer to this page for the latest updates and please note that logins will be disabled until Omega is brought back into service.
-
Omega Update
Sunday, September 13, 2015 - 7:00pmAs of today, September 13, the HPC team has been able to restore files to /home from tape backup, and has recovered a significant number of files (under 1 MB in size) from /scratch, and confidence tests run against the storage have been positive. We have run into some network instability that we need to correct before allowing user logins. Networking tests are currently underway and these tests will continue throughout this evening. We will be sending an additional status communication tomorrow, Monday September 14 with updates of these networking confidence tests, which will indicate if Omega is stable and ready to be brought into service.
We are deeply sympathetic to the impact of this issue on your research, and we sincerely apologize for the inconvenience. Once Omega is back in normal operation, we will provide any assistance we can to help you recover from this unfortunate disruption.