System Updates Archive

  • HIPAA VPN issues for Milgram

    Tuesday, June 18, 2019 - 9:00am

    We are currently experiencing an issue where many users have lost access to the HIPAA VPN required to access Milgram. We are working with ITS to resolve the issue as quickly as possible. Sorry for the inconvenience.

    12pm - This issue has been resolved. All Milgram users should be able to access the HIPAA VPN again.

  • Scheduler Issues on Grace

    Tuesday, June 18, 2019 - 10:30am
    We are currently experiencing issues with the Slurm scheduler that is affecting job submission and Slurm commands such as squeue. We are aware of the issue and working to resolve it as quickly as possible. Sorry for the inconvenience.
     
    2:30pm - The issue has been resolved and the scheduler is running normally again.
  • Scheduled Maintenance on Milgram

    As a reminder, scheduled maintenance will be performed on Milgram beginning Monday, June 3, 2019, at 8:00 am. Maintenance is expected to be completed by the end of day, Wednesday, June 5, 2019. During this time, logins will be disabled, and Milgram’s storage will not be available. We ask that you logoff the system prior to the start of the maintenance, after saving your work and closing any interactive applications.An email notification will be sent when the maintenance has been completed, and the cluster is available.

    As the maintenance window approaches, the Slurm scheduler will not start any job if the job’s requested wallclock time extends past the start of the maintenance period (8:00 am on June 3, 2019). If you run squeue, such jobs will show as pending jobs with the reason “ReqNodeNotAvail”. (If your job can actually be completed in less time than you requested, you may be able to avoid this by making sure that you request the appropriate time limit using “-t” or “–time”.) Held jobs will automatically return to active status after the maintenance period, at which time they will run in normal priority order.

    If you have questions, comments, or concerns, please contact us at hpc@yale.edu.

    Sincerely,

    Paul Gluhosky

  • Storage@Yale Scheduled Maintenance

    Thursday, March 28, 2019 - 9:00am
    The S@Y Archive tier will not be available from 9 AM to 3 PM on Thursday 3/28/2019.
    During this maintenance window, the S@Y Archive tier physical hardware is being moved to a
    new Data Center which should improve overall service reliability.
  • Farnam Scheduled Maintenance

     
    Dear Farnam Users,

    Scheduled maintenance will be performed on Farnam beginning Monday, March 11, 2019, at 8:00 am. Maintenance is expected to be completed by the end of day, Thursday, March 14, 2019. During this time, logins will be disabled and connections via Globus will not be available. 

    The YSM and Slayman GPFS storage will not be available on the Grace cluster. We ask that you logoff the system prior to the start of the maintenance, after saving your work and closing any interactive applications.  An email notification will be sent when the maintenance has been completed, and the cluster is available.

    As the maintenance window approaches, the Slurm scheduler will not start any job if the job’s requested wall clock time extends past the start of the maintenance period (8:00 am on March 11, 2019). If you run squeue, such jobs will show as pending jobs with the reason “ReqNodeNotAvail.” If your job can actually be completed in less time than you requested, you may be able to avoid this by making sure that you request the appropriate time limit using “-t” or “–time.” Held jobs will automatically return to active status after the maintenance period, at which time they will run in normal priority order.

    Please visit the status page at research.computing.yale.edu for the latest updates. If you have questions, comments, or concerns, please contact us at hpc@yale.edu.

    Sincerely,

    Paul Gluhosky

  • Scheduled Maintenance on Grace Complete

    We have returned the cluster to service, however, we are still
    experiencing issues with certain MPI programs that run across
    multiple nodes, where they either fail almost immediately or hang. If
    you experience such an issue, we would appreciate it if you could send
    us a report at hpc@yale.edu to assist us in resolving the issue as quickly
    as possible. We apologize for any inconvenience.
  • Grace Maintenance - UPDATE 2/8/2019

    2/8/2019 - 9:30am
     
    Dear Grace Users,
     
    We are continuing to work on the scheduled maintenance for Grace but some tasks are taking longer than expected. 
     
    Thank you for your patience. A further email notification will be sent once the cluster is available.
     
    If you have questions, comments, or concerns, please contact us at hpc@yale.edu.
     
  • Grace Maintenance

    Scheduled maintenance will be performed on Grace beginning Monday, February 4, 2019, at 8:00 am. Maintenance is expected to be completed by the end of day, Wednesday, February 6, 2019. During this time, logins will be disabled and connections via Globus will not be available. We ask that you logoff the system prior to the start of the maintenance, after saving your work and closing any interactive applications. The Loomis GPFS storage will remain available and accessible on the Omega and Farnam clusters. An email notification will be sent when the maintenance has been completed, and the cluster is available.
    As the maintenance window approaches, the Slurm scheduler will not start any job if the job’s requested wall clock time extends past the start of the maintenance period (8:00 am on February 4, 2019). If you run squeue, such jobs will show as pending jobs with the reason “ReqNodeNotAvail.” If your job can actually be completed in less time than you requested, you may be able to avoid this by making sure that you request the appropriate time limit using “-t” or “–time.” Held jobs will automatically return to active status after the maintenance period, at which time they will run in normal priority order.

    Aside from the system and security updates we perform during maintenance, we want you to know about the following change:

    Changes to Home Quotas: We will be changing the quotas on the Grace home directories from group limits to user limits during the maintenance. The new limits will be 100GB and 200,000 files per user. You can inspect your current usage by running “myquota” anywhere on the cluster. If you are currently using more than 100GB or 200,000 files, we recommend reducing your usage before the maintenance as you may not be able to run jobs until you are under the limits when the cluster returns.

    Please visit the status page at research.computing.yale.edu for the latest updates. If you have questions, comments, or concerns, please contact us at hpc@yale.edu.

    Sincerely,

    Paul Gluhosky
     

  • Slow response time on Farnam

    Tuesday, March 5, 2019 - 9:00am

    Users of the Farnam cluster may be experiencing slow response time when logging in and executing interactive commands. The support team is investigating and engaging with the storage vendor to resolve the problem. There is no ETA at this time but further information will be posted here when available.

    If you are experiencing a different issue on Farnam or an issue with a different cluster, please email hpc@yale.edu.

  • Milgram - Scheduled Maintenance

    Scheduled maintenance will be performed on Milgram beginning Monday, December 10, 2018, at 8:00 am. Maintenance is expected to be completed by the end of day, Wednesday, December 12, 2018. During this time, logins will be disabled, and Milgram’s storage will not be available. An email notification will be sent when the maintenance has been completed, and the cluster is available. Aside from the system and security updates we perform during maintenance, we want you to know about the following changes:
     
    Software Changes on Milgram: The software available via the modules system on Milgram is being upgraded to be more consistent with our other clusters. During the December maintenance, we will change the default module list to a new module collection. When the cluster returns, please let us know if any software you need is missing. The old installations will remain for the time being, but all new software will be installed into the new collection. The old installations can be accessed by running the following:
     
    source /apps/bin/old_modules.sh
     
    More information about this transition is available on our website at https://research.computing.yale.edu/support/hpc/user-manual/software-collection-upgrade
     
    Scratch Changes on Milgram: The “scratch” space on Milgram will be officially renamed to “scratch60” to be consistent with the other clusters. The path “/gpfs/milgram/scratch” will become “/gpfs/milgram/scratch60”. To prevent immediate job failures, we will create a symlink to the old path. We will deprecate the symlink at a later date, so after the maintenance window please update your paths accordingly.
     
    As the maintenance window approaches, the Slurm scheduler will not start any job if the job’s requested wallclock time extends past the start of the maintenance period (8:00 am on December 10, 2018). If you run squeue, such jobs will show as pending jobs with the reason “ReqNodeNotAvail”. (If your job can actually be completed in less time than you requested, you may be able to avoid this by making sure that you request the appropriate time limit using “-t” or “–time”.) Held jobs will automatically return to active status after the maintenance period, at which time they will run in normal priority order.
     
    If you have questions, comments, or concerns, please contact us at hpc@yale.edu.

Pages