Ruddle is intended for use only on projects related to the Yale Center for Genome Analysis. This applies to both storage and computation. Please do not use the facility for other projects. If you have any questions about this policy, please contact us.
Ruddle is named for Dr. Frank Ruddle, a Yale geneticist who was a pioneer in genetic engineering and the study of developmental genetics.
If you are a first time user, make sure to read the pertinent links from our user guide about using ssh. Once you have submitted a copy of your public key to us, you should be able to ssh to
ruddle.hpc.yale.edu. You will be prompted to authenticate with Duo, Yale's multifactor authentication service. As with the other Yale clusters, there are two login nodes; you will be randomly placed on one of them.
Partitions and Scheduler
The Ruddle cluster uses Slurm as the job scheduling system. Most users should use either the default or interactive partitions (Torque calls these queues). "default" is the default partition.
The scavenge partition allows access to unused nodes from other partitions. However, jobs running here will be forcefully aborted if other partitions require nodes and the request can only be satisfied by killing a scavenge job. We recommend that all jobs run on scavenge be either short-lived or capable of checkpointing their state.
|Partition||Access rules||Walltime||nx360||x3850||Total nodes||Total cores|
|default||batch use only; 300 total cores/user.||default 7d, max 30d||156||156||3012 (shared with interactive)|
|interactive||interactive use only; 20 total cores, 256GB/user.||default and max 24 hours||156||156||3012 (shared with default)|
|scavenge||can be preeempted; 300 total cores/user.||default 7d, max 30d||156||156||3012 (shared with default)|
|bigmem||by permission||default 7d, max 30d||2||2||64|
The memory your request for job is enforced; you will encounter errors if you attempt to use more. You should use --mem or --mem-per-cpu when submitting jobs to avoid running into problems. The default value is 1024MB/cpu.
You can specify a particular node type (e.g. nx360, m610, etc), by using -C type.
You can monitor your jobs and the nodes they're running on via this site.
Ruddle uses the module system for managing software packages. See our documentation on modules here. If you'd like something installed that isn't available, Please contact us. All new software on Ruddle will be available as a module.
We've installed commonly used genomes, built in a variety of formats, here:
Please contact us if you'd like us to install additional genomes or formats.
|Lenovo nx360||Intel Xeon E5-2660||2.6GHz||20||128GB|
|Lenovo x3850||Intel Xeon E7-4809 v3||2.0GHz||32||1.5TB|
|Dell m915||(4) AMD Opteron 6276||2.3GHz||32||512GB|
Note: Some caution is necessary when running on the m915 nodes, since they have an older instruction set than the nx360 and x3850 nodes. We have compiled the modules using the new nodes; some programs may fail with unknown instruction errors on the m915s. You may also run into this when compiling your own software on Ruddle.
The easiest solution is to avoid the m915s unless you specifically compile your program on them.
Accessing Sequencing Data
To avoid duplication of data and to save space that counts against your quotas, we suggest that you make soft links to your sequencing data rather than copying them. To do this, use
$ ln -s /path/to/something /path/to/link
If you do, be aware that if the original is deleted pursuant to the YCGA retention policy, your data will be lost unless you have made a true copy elsewhere.
To find the location of the sequence files on the storage, look at the URL that you were sent. If the path in the URL begins with:
gpfs_illumina/sequencer* -> the true path is /ycga-gpfs/illumina/sequencer* ba_sequencers# -> the true path is /ycga-ba/ba_sequencers# sequencers# -> the true path is /ycga-gpfs/sequencers/panfs/sequencers#
Below that, follow the path in the URL. For example, if the sample link you received is:
The path on the cluster to the data would be:
Ruddle has two parallel storage systems with a number of partitions. Paths beginning with /ycga-ba are Hitachi storage, while paths beginning with /ycga-gpfs are Lenovo GPFS system.
|/ycga-gpfs/home||user home dirs||125GB/user, 500K files||use
|/ycga-ba/data*||group specific data|
|/ycga-gpfs/sequencers/panfs/sequencers||old /panfs sequencer output||Files here have been quiped|
|/ycga-gpfs/project||group project space||3+TB/group||quota dependent on sequencing volume|
|/ycga-gpfs/scratch60||group scratch space||10TB/group||Files older than 60 days are automatically purged|
Group project and scratch space.
Each group has its own dedicated project and scratch directory. Each member of the group has a user-specific directory within that directory, and a link in their home directory pointing to their directory. A group quota applies to the entire group's usage in these directories.
To check quotas, run:
Note: The numbers for the above commands update daily, so changes in your quota won't be shown until the next day.
For the current info on the gpfs filesystems, run: