Status and Maintenance | Yale Center for Research Computing

Computing Systems Status

There are currently no known issues. Please reach out to research.computing@yale.edu with any questions or to report problems.

9/18/2025 9:00am: The issue with the Bouchet filesystem is now resolved. We apologize for the inconvenience and ask that you check your job outputs, and resubmit if necessary.

9/17/2025 9:30am: An issue with storage is currently preventing logins to the Bouchet cluster. We are actively working with the storage vendor to resolve the issue as quickly as possible.

Scheduled Maintenance

To perform critical updates and minimize downtime, regular maintenance will be performed on each cluster on a rotating schedule. During maintenance, logins will be disabled, jobs will not run, and cluster storage may be unavailable. Communication will be sent to users four weeks and one week before the maintenance period and in case of any changes.

All YCRC-managed clusters are down for planned maintenance in late Spring (dates for 2026 TBD).

This represents an updated approach to YCRC system maintenance. Until now, each cluster has had two full-downtime maintenance periods per year, each lasting three days. With the new approach, each cluster are updated twice a year. However, only one of these two annual maintenance periods are a full downtime. The other involves rolling updates to a live cluster. We are working toward a system in which the annual full-downtime maintenance will take place on all YCRC-managed clusters simultaneously. Approximately six months after this full downtime, the clusters will be patched with minor updates on a rolling basis with minimal disruption.

The new approach has several advantages. By consolidating the major cluster updates, YCRC is able to focus on the preparation for and execution of those major updates once a year, instead of the nearly once a month, freeing more time for supporting researchers. Performing maintenance on all clusters within a data center simultaneously facilitates maintenance on subsystems that affect multiple clusters. All clusters are kept on the same major version of the image throughout the year, making for a more consistent and easier-to-support environment. For any given cluster, the number of days per year of planned total downtime is reduced. Also, for each cluster, the number of planned total-downtime periods is reduced from two to one. There is a second period each year of rolling updates, but these will entail limited disruption.

If you have any questions, comments, or concerns, please contact us at hpc@yale.edu.

Upcoming maintenance on Grace, McCleary, Milgram, Misha, and Bouchet: December 17-18, 2025

Due to the limited updates needed at this time, the upcoming December maintenance will not be a full downtime, but will rather have limited disruptions. The clusters and storage will remain online and available throughout the maintenance period. Certain services will be unavailable for short periods during the maintenance window. There will be reduced availability of compute nodes at times, so users might experience temporary increases in wait times. The Bouchet Open OnDemand node and the Bouchet education partition nodes will remain available on December 17 and only be upgraded on December 18.

2025 Maintenance Schedule

Upcoming:

Hopper - December 10
Grace, Bouchet, Milgram, Misha, McCleary - December 17-18

Past:

Hopper - September 24
Bouchet and its associated storage - June 2 - June 5.
Grace, Milgram and Misha and their associated storage - June 9 - June 12.
McCleary and its associated storage - June 10 - June 12.

System Status Updates

All YCRC-managed clusters will be down for planned maintenance in June 2025

May 19, 2025
Grace Scheduled Maintenance - Aug 15 to Aug 17

April 8, 2024
Milgram Scheduled Maintenance - Aug 22 to Aug 24

April 8, 2024

All clusters are operational

Computing Systems Status

Scheduled Maintenance

Upcoming maintenance on Grace, McCleary, Milgram, Misha, and Bouchet: December 17-18, 2025

2025 Maintenance Schedule

System Status Updates

All YCRC-managed clusters will be down for planned maintenance in June 2025

Grace Scheduled Maintenance - Aug 15 to Aug 17

Milgram Scheduled Maintenance - Aug 22 to Aug 24

Affiliations

Training