Scheduled Maintenance on YCRC Clusters in June

All YCRC-managed clusters will be down for planned maintenance in early June as follows:

  • Bouchet and its associated storage will be down for maintenance from Monday, June 2, 2025 through Thursday, June 5, 2025.

  • Grace, Milgram and Misha and their associated storage will be down for maintenance from Monday, June 9, 2025 through Thursday, June 12, 2025.

  • McCleary and its associated storage will be down for maintenance from Tuesday, June 10, 2025 through Thursday, June 12, 2025.

Please note that the proposed timeframes are conservative; we recognize the disruptive impact and will do our best to streamline and shorten to minimize impact.

This represents the first instance of a new approach to YCRC system maintenance.  Until now, each cluster has had two full-downtime maintenance periods per year, each lasting three days.  With the new approach, each cluster will still be updated twice a year.  However, only one of these two annual maintenance periods will be a full downtime.  The other will involve rolling updates to a live cluster.  We are working toward a system in which the annual full-downtime maintenance will take place on all YCRC-managed clusters simultaneously.  This year, however, Bouchet maintenance will take place first, to coincide with the annual data-center downtime at Massachusetts Green High Performance Computing Center (MGHPCC), and then the remaining clusters, which are located at West Campus, will undergo maintenance the following week.  Approximately six months after this full downtime, the clusters will be patched with minor updates on a rolling basis with minimal disruption. 

The new approach has several advantages.  By consolidating the major cluster updates, YCRC will be able to focus on the preparation for and execution of those major updates once a year, instead of the nearly once a month, freeing more time for supporting researchers.  Performing maintenance on all clusters within a data center simultaneously facilitates maintenance on subsystems that affect multiple clusters.  All clusters will be kept on the same major version of the image throughout the year, making for a more consistent and easier-to-support environment.  For any given cluster, the number of days per year of planned total downtime will be somewhat reduced.  Also, for each cluster, the number of planned total-downtime periods will be reduced from two to one.  There will be a second period each year of rolling updates, but these will entail limited disruption.

If you have any questions, comments, or concerns, please contact us at hpc@yale.edu.