System Updates Archive
-
Grace Scheduled Maintenance
As a reminder, scheduled maintenance will be performed on Grace beginning Tuesday, August 3, 2021, at 8:00 am. Maintenance is expected to be completed by the end of day, Thursday, August 5, 2021.
During this time, logins will be disabled and connections via Globus will be unavailable. An email notification will be sent when the maintenance has been completed, and the cluster is available.
-
Farnam: Gibbs inaccessible. Gibbs service restored on Grace and Ruddle
Friday, July 16, 2021 - 3:30pmGibbs access has been restored on Grace and Ruddle.
Farnam is unable to access /gpfs/gibbs due to a networking issue. We are working with the vendor to resolve the issue as quickly as possible. Any jobs attempting to read or write data to /gpfs/gibbs may have failed. We apologize for the inconvenience.
-
Network Issue: ScienceNet unable to reach broader internet
ITS is currently experiencing issues with ScienceNet, impacting cluster access to the broader internet.
Any attempts to access sites off-campus (e.g. git pull, certain licenses) will fail until ScienceNet access to broader internet is restored. We apologize for the inconvenience. -
Milgram Scheduled Maintenance
Dear Milgram Users,Scheduled maintenance will begin Tuesday, June 8, 2021, at 8:00 am. We expect that the cluster will return to service by the end of the day on Thursday, June 10, 2021. During this time, logins to the cluster will be disabled. We ask that you logoff the system prior to the start of the maintenance, after saving your work and closing any interactive applications. An email notification will be sent when the maintenance has been completed, and the cluster is available.
As the maintenance window approaches, the Slurm scheduler will not start any job if the job’s requested wallclock time extends past the start of the maintenance period (8:00 am on Tuesday, June 8, 2021). If you run squeue, such jobs will show as pending jobs with the reason “ReqNodeNotAvail.” If your job can actually be completed in less time than you requested, you may be able to avoid this by making sure that you request the appropriate time limit using “-t” or “–time.” Held jobs will automatically return to active status after the maintenance period, at which time they will run in normal priority order. All running jobs will be terminated at the start of the maintenance period. Please plan accordingly.
Please visit the status page at research.computing.yale.edu/system-status for the latest updates. If you have questions, comments, or concerns, please contact us at hpc@yale.edu.
Sincerely,
Paul Gluhosky
-
Milgram Unavailable
Monday, May 17, 2021 - 12:00pmMilgram is currently unavailable from campus or the VPN due to a networking issue. We have an open ticket with ITS. We believe already submitted jobs are unaffected. We apologize for the inconvenience.
-
Grace Performance
14:15 - YCRC Staff are currently investigating degraded performance on Grace. Updates will be posted here as more information becomes available.
-
Grace and Milgram Unavailable
Thursday, May 6, 2021 - 12:30pmDue to a power issue, Grace and Milgram are currently unavailable. We are working to restore the systems to service as soon as possible. We apologize for the inconvenience.
1pm: Systems have been restored to service.
-
Ruddle Scheduled Maintenance
Dear Ruddle Users,Scheduled maintenance will begin Tuesday, May 4, 2021, at 8:00 am. We expect that the cluster will return to service by the end of the day on Thursday, May 6, 2021. During this time, logins to the cluster will be disabled. We ask that you logoff the system prior to the start of the maintenance, after saving your work and closing any interactive applications. An email notification will be sent when the maintenance has been completed, and the cluster is available.
As the maintenance window approaches, the Slurm scheduler will not start any job if the job’s requested wallclock time extends past the start of the maintenance period (8:00 am on Tuesday, May 4, 2021). If you run squeue, such jobs will show as pending jobs with the reason “ReqNodeNotAvail.” If your job can actually be completed in less time than you requested, you may be able to avoid this by making sure that you request the appropriate time limit using “-t” or “–time.” Held jobs will automatically return to active status after the maintenance period, at which time they will run in normal priority order. All running jobs will be terminated at the start of the maintenance period. Please plan accordingly.
Please visit the status page at research.computing.yale.edu/system-status for the latest updates. If you have questions, comments, or concerns, please contact us at hpc@yale.edu.
Sincerely,
Paul Gluhosky
-
Resolved: Unscheduled outage on Ruddle
Tuesday, April 6, 2021 - 8:15am to 11:15amNetwork issues impacted Ruddle from about 8:15am to 11:15am on Tuesday April 6. These issues are now resolved. Jobs (or job array elements) that started during this time most likely failed. Please check the status of your jobs. -
Farnam Scheduled Maintenance
Tuesday, April 6, 2021 - 12:00am to Thursday, April 8, 2021 - 11:59pmDear Farnam Users,
As a reminder, we will perform scheduled maintenance on Farnam starting on Tuesday, April 6, 2021, at 8:00 am. Maintenance is expected to be completed by the end of day, Thursday, April 8, 2021.
During this time, logins will be disabled and connections via Globus will be unavailable. Farnam storage (/gpfs/ysm and /gpfs/slayman) will remain available on the Grace cluster. We ask that you logoff the system prior to the start of the maintenance, after saving your work and closing any interactive applications. An email notification will be sent when the maintenance has been completed, and the clusters are available.
As the maintenance window approaches, the Slurm scheduler will not start any job if the job’s requested wallclock time extends past the start of the maintenance period (8:00 am on April 6, 2021). You can run the command “htnm” (short for “hours_to_next_maintenance”) to get the number of hours until the next maintenance period, which can aid in submitting jobs that will run before maintenance begins. If you run squeue, such jobs will show as pending jobs with the reason “ReqNodeNotAvail.” (If your job can actually be completed in less time than you requested, you may be able to avoid this by making sure that you request the appropriate time limit using “-t” or “–time”.) Held jobs will automatically return to active status after the maintenance period, at which time they will run in normal priority order.
Please visit the status page at research.computing.yale.edu/system-status for the latest updates. If you have questions, comments, or concerns, please contact us at hpc@yale.edu.
Sincerely,
Paul Gluhosky