Grace is a shared-use resource for the Faculty of Arts and Sciences (FAS). The cluster is named for the computer scientist and United States Navy Rear Admiral Grace Murray Hopper, who received her Ph.D. in Mathematics from Yale in 1934.
If you are a first time user, make sure to read the pertinent links from our user guide about using ssh. Once you have submitted a copy of your public key to us, you should be able to ssh to
grace.hpc.yale.edu. As with the other Yale clusters, there are two login nodes; you will be randomly placed on one of them.
Grace now uses a different scheduler from the old Grace, called Slurm, and therefore you will need to translate any existing submission scripts from LSF to Slurm. See our Slurm documentation for more information on using the scheduler. As on Grace, all nodes are shared, where multiple jobs will run on a single node if they request less than the total number of cores on that node.
|name||max cores/job||max walltime/job||description|
|interactive||4||6 hours||compiling/debugging/testing programs|
|bigmem||40||24 hours||1.5 TB of RAM per node|
|scavenge||20||24 hours||1 node limit per job|
The scavenge partition is a new partition available on Grace. It allows you to run a job outside of your normal fairshare restriction and makes use of any unutilized cores that are unavailable via the public partitions. However, note that any job on the scavenge partition is subject to preemption if the node in use is required by a job on its normal private partition. This means that your job will be killed immediately, so make sure to only run jobs on the scavenge partition that either have good checkpoint or otherwise can be restarted with minimal loss of progress. For this reason, keep in mind that not all jobs are a good fit for the scavenge queue, such as jobs with a long start up time or jobs that go a long time between checkpointing.
Grace uses the modules system for managing software and its dependencies. See our documentation on modules here.
|Node Type||Processor (--constraint tag)||Speed||Cores||RAM|
|IBM NeXtScale nx360 M4||Intel Xeon E5-2660 V2 (ivybridge)||2.20GHz||20||128G|
|Lenovo NeXtScale nx360 M5||Intel Xeon E5-2660 V3 (haswell)||2.60GHz||20||128G|
|Lenovo NeXtScale nx360 M5||Intel Xeon E5-2660 V4 (broadwell)||2.00GHz||28||256G|
Grace has 6 nodes on the gpu partition with 2 Nvidia Tesla K80s, each with 2 GPUs (for a total of 4 GPUs per node). See our GPU guide for instruction on requesting GPUs for your job.
File System: 2 PB of GPFS storage via FDR InfiniBand
By default, each group has 300 GB and a 1TB storage quota on the home and project partitions, respectively. A group's usage can be monitored using the groupquota.sh script available at:
Each PI group is provided with storage space for research data on the HPC clusters. The storage is separated into three tiers: home, project, and temporary scratch.
Home storage is designed for reliability, rather than performance. Do not use this space for routine computation. Use this space to store your scripts, notes, etc. Home storage is backed up daily.
In general, project storage is intended to be the primary storage location for HPC research data in active use. Project storage is not backed up.
60-Day Scratch (
This temporary storage should typically give you the best performance. Files older than 60 days will be deleted automatically. This space is not backed up, and you may be asked to delete files younger than 60 days old if this space fills up.
Other Storage Options
If you or your group finds these quotas don't accommodate your needs, contact us at firstname.lastname@example.org.
You can also mount Storage@Yale, which is a service offered by Yale ITS to University members. Note that S@Y mounted on a cluster will not be available to be mounted elsewhere. To request S@Y mounted on the clusters, fill out our S@Y Request Form.