System Updates Archive

  • Gibbs Storage Maintenance - March 13

    Saturday, March 13, 2021 - 6:00pm

    On the advice of our storage vendor, we will be performing maintenance on the Gibbs storage starting immediately. The maintenance is expected to be completed by Monday, March 15, 2021, at 12:00 pm. Taking this action now has been deemed necessary in order to reduce the possibility of file corruption and data loss.

    During this time, files on Gibbs (/gpfs/gibbs) will not be available from any cluster. We ask that, if you are accessing files on Gibbs in any interactive applications, you logoff the system as soon as possible, after saving your work and closing the applications. Please do not run batch jobs which access the files on Gibbs during the maintenance. An email notification will be sent when the maintenance has been completed, and the storage is available.
     

  • Grace Scheduled Maintenance

     

     

    As a reminder, scheduled maintenance will be performed on Grace beginning Tuesday, February 2, 2021, at 8:00 am. Maintenance is expected to be completed by the end of day, Thursday, February 4, 2021. 

    During this time, logins will be disabled and connections via Globus will be unavailable. The Loomis storage will not be available on the Farnam cluster. We ask that you logoff the system prior to the start of the maintenance, after saving your work and closing any interactive applications. An email notification will be sent when the maintenance has been completed, and the cluster is available.

    As the maintenance window approaches, the Slurm scheduler will not start any job if the job’s requested wallclock time extends past the start of the maintenance period (8:00 am on February 2, 2021). If you run squeue, such jobs will show as pending jobs with the reason “ReqNodeNotAvail”. (If your job can actually be completed in less time than you requested, you may be able to avoid this by making sure that you request the appropriate time limit using “-t” or “–time”.) Held jobs will automatically return to active status after the maintenance period, at which time they will run in normal priority order.

    Please visit the status page at research.computing.yale.edu/system-status for the latest updates. If you have questions, comments, or concerns, please contact us at hpc@yale.edu.

    Sincerely,

    Paul Gluhosky

  • Winter Recess

    Wednesday, December 23, 2020 - 9:00am to Monday, January 4, 2021 - 9:00am

    During the winter recess from Dec 23-Jan 3 YCRC staff will monitor the HPC clusters and user tickets. We will do our best to address critical situations, but most issues will be addressed once we return on Jan 4th.

  • Ruddle Scheduled Maintenance

    Dear Ruddle Users,
     

    Scheduled maintenance will begin Monday, December 7, 2020, at 7:00 am. We expect that the cluster will return to service by the end of the day on Wednesday, December 9, 2020. During this time, logins to the cluster will be disabled. We ask that you logoff the system prior to the start of the maintenance, after saving your work and closing any interactive applications. An email notification will be sent when the maintenance has been completed, and the cluster is available.

    As the maintenance window approaches, the Slurm scheduler will not start any job if the job’s requested wallclock time extends past the start of the maintenance period (7:00 am on Monday, December 7, 2020). If you run squeue, such jobs will show as pending jobs with the reason “ReqNodeNotAvail.” If your job can actually be completed in less time than you requested, you may be able to avoid this by making sure that you request the appropriate time limit using “-t” or “–time.” Held jobs will automatically return to active status after the maintenance period, at which time they will run in normal priority order. All running jobs will be terminated at the start of the maintenance period. Please plan accordingly.

    Please visit the status page at research.computing.yale.edu/system-status for the latest updates. If you have questions, comments, or concerns, please contact us at hpc@yale.edu.

    Sincerely,

    Paul Gluhosky

  • Milgram Scheduled Maintenance

    Dear Milgram Users,
     

    Scheduled maintenance will begin Monday, November 2, 2020, at 6:30 am. We expect that the cluster will return to service by the end of the day on Wednesday, November 4, 2020. During this time, logins to the cluster will be disabled. We ask that you logoff the system prior to the start of the maintenance, after saving your work and closing any interactive applications. An email notification will be sent when the maintenance has been completed, and the cluster is available.

    As the maintenance window approaches, the Slurm scheduler will not start any job if the job’s requested wallclock time extends past the start of the maintenance period (6:30 am on Monday, November 2, 2020). If you run squeue, such jobs will show as pending jobs with the reason “ReqNodeNotAvail.” If your job can actually be completed in less time than you requested, you may be able to avoid this by making sure that you request the appropriate time limit using “-t” or “–time.” Held jobs will automatically return to active status after the maintenance period, at which time they will run in normal priority order. All running jobs will be terminated at the start of the maintenance period. Please plan accordingly.

    Please visit the status page at research.computing.yale.edu/system-status for the latest updates. If you have questions, comments, or concerns, please contact us at hpc@yale.edu.

    Sincerely,

    Paul Gluhosky

  • Farnam Scheduled Maintenance

    Dear Farnam Users,

    As a reminder, we will perform scheduled maintenance on Farnam starting on Monday, October 5, 2020, at 8:00 am. Maintenance is expected to be completed by the end of day, Wednesday, October 7, 2020.

    During this time, logins will be disabled and connections via Globus will be unavailable. Farnam storage (/gpfs/ysm and /gpfs/slayman) will remain available on the Grace cluster. We ask that you logoff the system prior to the start of the maintenance, after saving your work and closing any interactive applications. An email notification will be sent when the maintenance has been completed, and the clusters are available.

    As the maintenance window approaches, the Slurm scheduler will not start any job if the job’s requested wallclock time extends past the start of the maintenance period (8:00 am on October 5, 2020). You can run the command “htnm” (short for “hours_to_next_maintenance”) to get the number of hours until the next maintenance period, which can aid in submitting jobs that will run before maintenance begins. If you run squeue, such jobs will show as pending jobs with the reason “ReqNodeNotAvail.” (If your job can actually be completed in less time than you requested, you may be able to avoid this by making sure that you request the appropriate time limit using “-t” or “–time”.) Held jobs will automatically return to active status after the maintenance period, at which time they will run in normal priority order.

  • Aug 4th Power Disruption

    Tues Aug 4 - Due to extreme weather conditions, sometime between
    approximately 2:00PM and 2:30PM there was a power interruption
    involving the clusters.  Some compute nodes were affected, which
    would have caused running jobs to fail.  Please check the status
    of any jobs that were running during that time. Contact the YCRC
    staff with any additional questions.

     

  • Scheduled Maintenance on Grace

    Dear Grace and Farnam Users,

    Scheduled maintenance will be performed on Grace beginning Monday, August 3, 2020, at 8:00 am. Maintenance is expected to be completed by the end of day, Wednesday, August 5, 2020.

    During this time, logins will be disabled and connections via Globus will be unavailable. The Loomis storage will not be available on the Farnam cluster. We ask that you logoff the system prior to the start of the maintenance, after saving your work and closing any interactive applications. An email notification will be sent when the maintenance has been completed, and the clusters are available.

    Please visit the status page at research.computing.yale.edu/system-status for the latest updates. If you have questions, comments, or concerns, please contact us at hpc@yale.edu.
     

  • Ruddle scheduled maintenance

    Tuesday, June 9, 2020 - 8:00am

    The Ruddle cluster will be unavailable due to scheduled maintenance until the end of day, Thursday, June 11, 2020. A communication will be sent when the cluster is available. If you have questions, comments, or concerns, please contact us at hpc@yale.edu.

  • RESOLVED Grace / Farnam Storage Performance

    RESOLVED - 17:00

    5/20/20 - 16:15 - YCRC staff are currently investigating an issue that is affecting storage performance on Grace and Farnam. 

Pages