Ruddle

About

Ruddle is intended for use only on projects related to the Yale Center for Genome Analysis. This applies to both storage and computation. Please do not use the facility for other projects. If you have any questions about this policy, please contact us.

Ruddle is named for Dr. Frank Ruddle, a Yale geneticist who was a pioneer in genetic engineering and the study of developmental genetics.

Logging in

If you are a first time user, make sure to read the pertinent links from our user guide about using ssh. Once you have submitted a copy of your public key to us, you should be able to ssh to ruddle.hpc.yale.edu. You will be prompted to authenticate with Duo, Yale's multifactor authentication service. As with the other Yale clusters, there are two login nodes; you will be randomly placed on one of them.

Partitions and Scheduler

The Ruddle cluster uses Slurm as the job scheduling system. Most users should use either the default or interactive partitions (Torque calls these queues). "default" is the default partition.

The scavenge partition allows access to unused nodes from other partitions. However, jobs running here will be forcefully aborted if other partitions require nodes and the request can only be satisfied by killing a scavenge job. We recommend that all jobs run on scavenge be either short-lived or capable of checkpointing their state.

Partition Access rules Walltime nx360 x3850 Total nodes Total cores
- - c13n01-c25n12 - -
default batch use only; 300 total cores/user. default 7d, max 30d 156 156 3012 (shared with interactive)
interactive interactive use only; 20 total cores, 256GB/user. default and max 24 hours 156 156 3012 (shared with default)
scavenge can be preeempted; 300 total cores/user. default 7d, max 30d 156 156 3012 (shared with default)
bigmem by permission default 7d, max 30d 2 2 64
Total - 156 2 158 3076

The memory your request for job is enforced; you will encounter errors if you attempt to use more. You should use --mem or --mem-per-cpu when submitting jobs to avoid running into problems. The default value is 1024MB/cpu.
You can specify a particular node type (e.g. nx360, m610, etc), by using -C type.

You can monitor your jobs and the nodes they're running on via this site.

Software

Ruddle uses the module system for managing software packages. See our documentation on modules here. If you'd like something installed that isn't available, Please contact us. All new software on Ruddle will be available as a module.

Genomes

We've installed commonly used genomes, built in a variety of formats, here: /home/bioinfo/genomes

Please contact us if you'd like us to install additional genomes or formats.

Compute Hardware

Node Type Processor Speed Cores RAM
Lenovo nx360 Intel Xeon E5-2660 2.6GHz 20 128GB
Lenovo x3850 Intel Xeon E7-4809 v3 2.0GHz 32 1.5TB
Dell m915 (4) AMD Opteron 6276 2.3GHz 32 512GB

Note: Some caution is necessary when running on the m915 nodes, since they have an older instruction set than the nx360 and x3850 nodes. We have compiled the modules using the new nodes; some programs may fail with unknown instruction errors on the m915s. You may also run into this when compiling your own software on Ruddle.
The easiest solution is to avoid the m915s unless you specifically compile your program on them.

Accessing Sequencing Data

To avoid duplication of data and to save space that counts against your quotas, we suggest that you make soft links to your sequencing data rather than copying them. To do this, use ln:

$ ln -s /path/to/something /path/to/link

If you do, be aware that if the original is deleted pursuant to the YCGA retention policy, your data will be lost unless you have made a true copy elsewhere.

To find the location of the sequence files on the storage, look at the URL that you were sent. If the path in the URL begins with:

gpfs_illumina/sequencer* -> the true path is /ycga-gpfs/illumina/sequencer*
ba_sequencers# -> the true path is /ycga-ba/ba_sequencers#
sequencers# -> the true path is /ycga-gpfs/sequencers/panfs/sequencers#

Below that, follow the path in the URL. For example, if the sample link you received is:

http://sysg1.cs.yale.edu:2011/gen?fullPath=sequencers2/sequencerV/runs/131107_D00306_0096_AH7PWEADXX/Data/Intensities/BaseCalls/Aligned/Project_Rdb9/Sample_3

The path on the cluster to the data would be:

/ycga-gpfs/sequencers/panfs/sequencers2/sequencerV/runs/131107_D00306_0096_AH7PWEADXX/Data/Intensities/BaseCalls/Aligned/Project_Rdb9/Sample_3

Storage

Ruddle has two parallel storage systems with a number of partitions. Paths beginning with /ycga-ba are Hitachi storage, while paths beginning with /ycga-gpfs are Lenovo GPFS system.

Partition/ Usage Quota Notes
/ycga-gpfs/home user home dirs 125GB/user, 500K files use /home in hard-coded paths
/ycga-ba/data* group specific data
/ycga-gpfs/sequencers/panfs/sequencers[1234] old /panfs sequencer output Files here have been quiped
/ycga-ba/ba_sequencers[12356] sequencer output
/ycga-ba/ba_sequencers[123]/scratch scratch space
/ycga-gpfs/project group project space 3+TB/group quota dependent on sequencing volume
/ycga-gpfs/scratch60 group scratch space 10TB/group Files older than 60 days are automatically purged

Group project and scratch space.

Each group has its own dedicated project and scratch directory. Each member of the group has a user-specific directory within that directory, and a link in their home directory pointing to their directory. A group quota applies to the entire group's usage in these directories.

Checking Quotas

To check quotas, run:

$ /home/software/bin/myquota.sh
$ /home/software/bin/groupquota.sh

Note: The numbers for the above commands update daily, so changes in your quota won't be shown until the next day.

For the current info on the gpfs filesystems, run:

$ /ycga-gpfs/bin/my_quota.sh