System Updates Archive

  • Issue with Milgram Connections on VPN

    3/28/24 - ITS is currently working on resolving an issue that inhibits connections to Milgram over the access.its.yale.edu VPN.  Users are asked to point their VPN clients to-

    vpn3.its.yale.edu, vpn5.its.yale.edu or vpn6.its.yale.edu until this issue is resolved.

    Please contact hpc@yale.edu if you have any questions.

  • Scheduled Maintenance on Milgram

    Milgram Scheduled Maintenance
     
    Dear Milgram Users,
     
    We write to remind you that scheduled maintenance will be performed on the Milgram cluster starting on Tuesday, February 6, 2024, at 8:00 am.  Maintenance is expected to be completed by the end of day, Thursday, February 8, 2024.
     
    During the maintenance, logins to the cluster will be disabled. We ask that you save your work, close interactive applications, and logoff the system prior to the start of the maintenance.  An email notification will be sent when the maintenance has been completed, and the cluster is available.
     
    Upgrade to Red Hat 8
    Milgram’s current operating system, Red Hat Enterprise Linux (RHEL) 7, will be officially end-of-life in 2024 and will no longer be supported with security patches by the developer.  Therefore, Milgram will be upgraded to RHEL 8 during this maintenance.  
     
    Changes to Interactive Partitions and Jobs
    We are making two changes to interactive jobs during the upcoming maintenance.  
     
    The ‘interactive’ and ‘psych_interactive partitions will be renamed to ‘devel’ and ‘psych_devel’, respectively, to bring Milgram in alignment with the other clusters.  This change has been made on other clusters in recognition that interactive-style jobs (such as OnDemand and ‘salloc’ jobs) are commonly run outside of the ‘interactive’ partition.  Please adjust your workflows accordingly after the maintenance.
     
    Additionally, all users will be limited to 4 interactive app instances (of any type) at one time.  Additional instances will be rejected until you delete older open instances.  For OnDemand jobs, closing the window does not terminate the interactive app job.  To terminate the job, click the “Delete” button in your “My Interactive Apps” page in the web portal.
     
    Please visit the status page at research.computing.yale.edu/system-status for the latest updates.  If you have questions, comments, or concerns, please contact us at hpc@yale.edu.
     
    Sincerely,
     
    Paul Gluhosky
     
  • Problem with Grace cluster

    Thursday, January 25, 2024 - 3:20pm
    1/25 3:20pm The Grace cluster is currently experiencing a problem with home directories causing logins to fail and any processes/jobs that require home directory access to hang. We are working with the storage vendor to resolve the issue. We are aware of the issue and working to restore access as quickly as possible. We will post here if and when we have further updates. We apologize for the inconvenience.
  • Gibbs Available

    Wednesday, January 3, 2024 - 4:37pm
    1/3 4:37pm The Gibbs filesystem (pi and project storage) is now back online  and accessible. We understand the disruptions this has caused and sincerely apologize for any inconvenience.
     
    Gibbs is now functioning normally. However, we are still actively investigating the root cause of the outage to prevent similar incidents from occurring in the future.
  • Gibbs Unavailable

    Wednesday, January 3, 2024 - 11:35am
    1/3 11:35am The Gibbs filesystem (pi and project storage) is currently unresponsive, resulting in errors such as “Stale File Handle”. We are aware of the issue and working to restore access as quickly as possible. We will post here if and when we have further updates. We apologize for the inconvenience.
  • Scheduled Maintenance on Grace

    Grace Scheduled Maintenance

    Dear Grace Users,

    Scheduled maintenance will be performed on the Grace cluster starting on Tuesday, December 5, 2023, at 8:00 am.  Maintenance is expected to be completed by the end of day, Thursday, December 7, 2023.

    During this time, logins will be disabled, running jobs will be terminated, and connections via Globus will be unavailable.

    Multifactor Authentication (MFA)

    Multifactor authentication via Duo will be required for ssh for all users on Grace after the maintenance.  For most usage, this additional step is minimally invasive and makes our clusters much more secure.

    Please visit the status page at research.computing.yale.edu/system-status for the latest updates.  If you have questions, comments, or concerns, please contact us at hpc@yale.edu.

    Sincerely,

    Paul Gluhosky

     

  • McCleary Scheduled Maintenance 10/3-10/5

    McCleary Scheduled Maintenance

     
     

    Scheduled maintenance will be performed on the McCleary cluster, starting on Tuesday, October 3, 2023, at 8:00 am. Maintenance is expected to be completed by the end of day, Thursday, October 5, 2023.

    During the maintenance, logins to the cluster will be disabled.  Storage will remain available.  We ask that you save your work, close interactive applications, and logoff the system prior to the start of the maintenance. An email notification will be sent when the maintenance has been completed, and the cluster is available.

    As the maintenance window approaches, the Slurm scheduler will not start any job if the job’s requested wallclock time extends past the start of the maintenance period (8:00 am on October 3, 2023). You can run the command “htnm” (short for “hours_to_next_maintenance”) to determine the number of hours until the next maintenance period, which can aid in submitting jobs that will run before maintenance begins. If you run squeue, such jobs will show as pending jobs with the reason “ReqNodeNotAvail.” (If your job can actually be completed in less time than you requested, you may be able to avoid this by making sure that you request the appropriate time limit using “-t” or “–time”.) Held jobs will automatically return to active status after the maintenance period, at which time they will run in normal priority order.  All running jobs will be terminated at the start of the maintenance period.  Please plan accordingly.

    Please visit the status page at research.computing.yale.edu/system-status for the latest updates.  If you have questions, comments, or concerns, please contact us at hpc@yale.edu.

     

  • Milgram Scheduled Maintenance - Aug 22 to Aug 24

    Dear Milgram Users,
     
    Please be aware that scheduled maintenance will be performed on the Milgram cluster starting on Tuesday, August 22, 2023, at 8:00 am.  Maintenance is expected to be completed by the end of day, Thursday, August 24, 2023. 
     
    Multifactor authentication via Duo will be required for ssh for all users on Milgram after the maintenance.  For most usage, this additional step is minimally invasive and makes our clusters much more secure. However, for users who use graphical transfer tools such as Cyberduck, please see our MFA transfer documentation.
     
    During the maintenance, logins to the cluster will be disabled. We ask that you save your work, close interactive applications, and logoff the system prior to the start of the maintenance.  An email notification will be sent when the maintenance has been completed, and the cluster is available.
     
    As the maintenance window approaches, the Slurm scheduler will not start any job if the job’s requested wallclock time extends past the start of the maintenance period (8:00 am on August 22, 2023).  You can run the command “htnm” (short for “hours_to_next_maintenance”) to get the number of hours until the next maintenance period, which can aid in submitting jobs that will run before maintenance begins.  If you run squeue, such jobs will show as pending jobs with the reason “ReqNodeNotAvail.” (If your job can actually be completed in less time than you requested, you may be able to avoid this by making sure that you request the appropriate time limit using “-t” or “–time”.)  Held jobs will automatically return to active status after the maintenance period, at which time they will run in normal priority order.  All running jobs will be terminated at the start of the maintenance period.  Please plan accordingly.
     
    Please visit the status page at research.computing.yale.edu/system-status for the latest updates.  If you have questions, comments, or concerns, please contact us at hpc@yale.edu.
     
    Sincerely,
     
    Paul Gluhosky
     
     
  • Grace Scheduled Maintenance - Aug 15 to Aug 17

     
    Dear Grace Users,
     
    Please be aware that scheduled maintenance will be performed on the Grace cluster starting on Tuesday, August 15, 2023, at 8:00 am.  Maintenance is expected to be completed by the end of day, Thursday, August 17, 2023. 
     
    During this time, logins will be disabled, running jobs will be terminated, and connections via Globus will be unavailable.  We ask that you save your work, close interactive applications, and logoff the system prior to the start of the maintenance.  An email notification will be sent when the maintenance has been completed, and the cluster is available.
     
    As the maintenance window approaches, the Slurm scheduler will not start any job if the job’s requested wallclock time extends past the start of the maintenance period (8:00 am on August 15, 2023).  You can run the command “htnm” (short for “hours_to_next_maintenance”) to determine the number of hours until the next maintenance period, which can aid in submitting jobs that will run before maintenance begins.  If you run squeue, such jobs will show as pending jobs with the reason “ReqNodeNotAvail.”  (If your job can actually be completed in less time than you requested, you may be able to avoid this by making sure that you request the appropriate time limit using “-t” or “–time”.)  Held jobs will automatically return to active status after the maintenance period, at which time they will run in normal priority order.
     
    Jobs submitted prior to maintenance will be held and will run in RHEL8 after the cluster is returned to service.  If there are concerns that jobs will not run properly on RHEL8, please cancel your pending jobs before maintenance.
     
    Upgrade to Red Hat 8
     
    As part of this maintenance, the operating system on Grace will be upgraded to Red Hat 8.  A new unified software tree will be created that will be shared with the McCleary cluster. 
     
    Please visit the status page at research.computing.yale.edu/system-status for the latest updates.  If you have questions, comments, or concerns, please contact us at hpc@yale.edu.
     
    Sincerely,
     
    Paul Gluhosky
  • HPC network connection issues

    6/27/23 9:00am - We are currently experiencing network connection issues into and out of the HPC clusters.  YCRC is working with ITS to resolve the issue as quickly as possible.

Pages