January 2022
Performance Issues on Grace Storage (Loomis)
Over the past week, we have experienced performance problems with the Loomis storage system (which holds the application tree, home directories, scratch, and most project directories for Grace). At first, this only resulted in intermittent slow logins. On Tuesday afternoon, however, the situation became more severe, and we were forced to reboot the entire storage system. Many jobs running at that time failed, so please check the status of your jobs. At that point, we disabled the slurm partitions on Grace to prevent further job failures.
Grace Update: Partitions now open
We have made significant progress with the storage issue and re-opened all partitions on Grace. Please let us know if you see any issues. Jobs that were running on Tuesday afternoon (01/25/2022) likely failed. Please check the status of your jobs.
Loomis Storage is Unavailable, Grace job partitions disabled
We are currently addressing an issue with the Loomis storage system. All job partitions on Grace are disabled while we work to resolve the issue.
Intermittent issues with Grace Cluster Responsiveness
YCRC staff are investigating intermittent issues with the responsiveness of the grace cluster, including logins that take a long time to complete. We are working to resolve the problem as quickly as possible.