Dear Milgram Users,
Please be aware that scheduled maintenance will be performed on the Milgram cluster starting on Tuesday, August 22, 2023, at 8:00 am. Maintenance is expected to be completed by the end of day, Thursday, August 24, 2023.
Multifactor authentication via Duo will be required for ssh for all users on Milgram after the maintenance. For most usage, this additional step is minimally invasive and makes our clusters much more secure. However, for users who use graphical transfer tools such as Cyberduck, please see our MFA transfer documentation.
During the maintenance, logins to the cluster will be disabled. We ask that you save your work, close interactive applications, and logoff the system prior to the start of the maintenance. An email notification will be sent when the maintenance has been completed, and the cluster is available.
As the maintenance window approaches, the Slurm scheduler will not start any job if the job’s requested wallclock time extends past the start of the maintenance period (8:00 am on August 22, 2023). You can run the command “htnm” (short for “hours_to_next_maintenance”) to get the number of hours until the next maintenance period, which can aid in submitting jobs that will run before maintenance begins. If you run squeue, such jobs will show as pending jobs with the reason “ReqNodeNotAvail.” (If your job can actually be completed in less time than you requested, you may be able to avoid this by making sure that you request the appropriate time limit using “-t” or “–time”.) Held jobs will automatically return to active status after the maintenance period, at which time they will run in normal priority order. All running jobs will be terminated at the start of the maintenance period. Please plan accordingly.
Please visit the status page at research.computing.yale.edu/system-status for the latest updates. If you have questions, comments, or concerns, please contact us at hpc@yale.edu.
Sincerely,
Paul Gluhosky