Milgram

During the upcoming December maintenance, we will be upgrading the software collection on Milgram. This upgrade will enable us to better support your software needs, but may require you to update any "module load" commands in your scripts. See this page for more information.

About

Stanley Milgram

Milgram is a HIPAA aligned Department of Psychology cluster intended for use on projects that may involve sensitive data. This applies to both storage and computation. If you have any questions about this policy, please contact us.

Milgram is named for Dr. Stanley Milgram, a psychologist who researched the behavioral motivations behind social awareness in individuals and obedience to authority figures. He conducted several famous experiments during his professorship at Yale University including the lost-letter experiment, the small-world experiment, and the Milgram experiment on obedience to authority figures.

Training Video

We have a training video available to orient you to Milgram and how best to use the cluster.

Logging in

If you are a first time user, please read the pertinent links from our user guide about using ssh. You will also need to make sure you have a public ssh key uploaded to the cluster.

You will then need to log in to the Yale HIPAA VPN. Use the Cisco AnyConnect VPN client to connect to access.yale.edu/hipaa . Note that you will need to use Duo Multi-factor Authentication (MFA) to log in to the VPN just as you would the standard VPN.

Once on the vpn, you should be able to ssh to milgram.hpc.yale.edu. As with the other Yale clusters, there are two login nodes; you will be randomly placed on one of them. If your group has a workstation (see list), you can connect using one of those.

Partitions and Scheduler

Milgram uses the Slurm job scheduler. Unless users request exclusive node access when submitting jobs, multiple jobs from different users may run on a single node. To facilitate this, the scheduler will strictly enforce memory limits to ensure that all jobs have access to the memory requested for them. To see more details about how jobs are scheduled see our Job Scheduling documentation.

All partitions on Milgram have a default walltime limit of 1 hour. Use the -t HH:MM:SS flag to request additional time up to the limits listed below. Similarly, they have a default memory limit of 5GB per requested core. If you run into insufficient memory errors, use the --mem-per-cpu flag to increase your job's memory limit. Slurm documentation for more details on requesting computing resources and submitting jobs.

name max walltime/job core count* notes
interactive** 6 hours 20 1 job/user, 4 cores/job
short 6 hours all 772 cores/user, 1158 cores/group
long 2 days 1188
verylong 7 days 792
scavenge no limit all

*Nodes are shared across all partitions on Milgram. However, there is a maximum number of cores that can be used in certain partitions at one time across all users. All core limits also have an equivalent memory limit of 5GB*(core limit).

Scavenge Partition

Submitting jobs to the scavenge partition allows you to run outside of your normal fairshare restriction by making use of unutilized cores that are available in other partitions on the cluster. However, any job running in the scavenge partition is subject to preemption if a node in use by the job is required for a job in the node's normal partition. You should only run jobs in the scavenge partition that either have good checkpoint capabilities or that can be restarted with minimal loss of progress. Some jobs are not good fits for the scavenge partition, such as jobs with long startup times or jobs that run a long time between checkpoint operations.

Software

Milgram uses the modules system for managing software and it's dependencies. See our documentation on modules here.

Compute Hardware

Compute Node Count Processor Features* Cores RAM**
Lenovo nx360 M5 48 Intel Xeon E5-2660 V4 broadwell, v4, sse4_2, avx, avx2, E5-2660_v4 28 250GB
Dell R730 12 Intel Xeon E5-2660 v3 haswell, v3, sse4_2, avx, avx2, E5-2660_v3 20 121GB

*For more info on how to use features, please see the slurm documentation.

**The RAM listed here is the amount available for allocation to jobs. We reserve a small amount for system and administrative services.

Storage

Each PI group is provided with storage space for research data on the HPC clusters. The storage is separated into three tiers: home, project, and temporary scratch. A group's usage can be monitored using the groupquota script available at:

/apps/bin/groupquota

Home

Home storage is designed for reliability, rather than performance. Do not use this space for routine computation. Use this space to store your scripts, notes, etc. Home storage is backed up daily.

Project

In general, project storage is intended to be the primary storage location for HPC research data in active use. Project storage is backed up daily.

60-Day Scratch (scratch)

This temporary storage should typically give you the best performance. Files older than 60 days will be deleted automatically. This space is not backed up, and you may be asked to delete files younger than 60 days old if this space fills up.