Status and Maintenance

Computing Systems Status

There are currently no known issues. Please contact research.computing@yale.edu to report problems. 

Resolved past issues:

5/11/2026:  There was a power drop at the West Campus Data Center on the morning of 5/11/2026.  This impacted running jobs on McCleary, Grace, Misha and Milgram.  Please check your jobs to see how they were impacted.

2/13/2026: Bouchet scheduler issues have been resolved. All systems are operational. 

Cluster Maintenance

To perform critical updates and minimize downtime, regular maintenance will be performed on each cluster on a rotating schedule. During maintenance, logins will be disabled, jobs will not run, and cluster storage may be unavailable. Communication will be sent to users four weeks and one week before the maintenance period and in case of any changes.

Starting in the Spring of 2025, YCRC updated our approach to system maintenance.  Until then, each cluster had two full-downtime maintenance periods per year, each lasting three days.  With the new approach, each cluster is updated twice a year.  However, only one of these two annual maintenance periods is a full downtime.  The other involves rolling updates to a live cluster.  We are working toward a system in which annual full-downtime maintenance is performed on all YCRC-managed clusters simultaneously.  Approximately six months after this full downtime, the clusters will be patched with minor updates on a rolling basis with minimal disruption. 

The new approach has several advantages.  By consolidating the major cluster updates, YCRC is able to focus on the preparation for and execution of those major updates once a year, instead of the nearly once a month, freeing more time for supporting researchers.  Simultaneously performing maintenance on all clusters within a data center enables maintenance of subsystems that affect multiple clusters.  All clusters are kept on the same major version of the image throughout the year, resulting in a more consistent, easier-to-support environment.  For any given cluster, the number of days per year of planned total downtime is reduced.  Also, for each cluster, the number of planned total-downtime periods is reduced from two to one.  There is a second period each year of rolling updates, but these will entail limited disruption.

If you have any questions, comments, or concerns, please contact us at hpc@yale.edu

System Status Updates

  • Bouchet and Hopper, along with their associated storage, will be down for maintenance from 2:30 pm Monday, June 15, 2026, through the end of the day on Thursday, June 18, 2026.
  • Grace, McCleary, Milgram, and Misha, along with their associated storage, will be down for maintenance from 8:00 am on Tuesday, June 16, 2026, through the end of the day on Wednesday, June 17, 2026.

Maintenance Schedule

Upcoming:

All YCRC-managed clusters will be down for planned maintenance in mid-June as follows:
 

  • Bouchet and Hopper, along with their associated storage, will be down for maintenance from 2:30 pm Monday, June 15, 2026, through the end of the day on Thursday, June 18, 2026.

  • Grace, McCleary, Milgram, and Misha, along with their associated storage, will be down for maintenance from 8:00 am on Tuesday, June 16, 2026, through the end of the day on Wednesday, June 17, 2026.



Please note that the proposed timeframes are estimates; we recognize the disruptive impact and will do our best to streamline and shorten the maintenance activities to minimize it.

Past:

  • Hopper - March 10th, 2026
  • Grace, Bouchet, Milgram, Misha, McCleary - December 17-18
  • Bouchet and its associated storage  - June 2 - June 5.
  • Grace, Milgram and Misha and their associated storage  - June 9 - June 12.
  • McCleary and its associated storage - June 10 - June 12.