Omega

About

Omega is a shared-use resource for the Faculty of Arts and Sciences (FAS).

Logging in

If you are a first time user, make sure to read the pertinent links from our user guide about using ssh. Once you have submitted a copy of your public key to us, you should be able to ssh to omega.hpc.yale.edu. As with the other Yale clusters, there are two login nodes; you will be randomly placed on one of them.

Queues and Scheduler

Omega uses Torque for resource management and Moab to manage and schedule jobs. The job queue/scheduler policy combines a long-term accounting policy with a short-term "fair-share" component to provide all users access to cluster resources. See below for detailed info on the queues:

name max nodes/job max walltime/job base priority description
fas_devel 4 4 hours very high for compiling/testing programs, limit 1 job per user
fas_high 64 24 hours normal same as fas_normal (exists for historical reasons)
fas_normal 64 24 hours normal
fas_low 64 24 hours low
fas_long 32 3 days normal restricted to 256 cores max
fas_very_long 4 28 days normal restricted to 128 cores max & 4 running jobs per group
fas_bg 64 24 hours very low
gputest * 1 24 hours normal

* There are 4 nodes available on the gputest queue on Omega, each with 1 Tesla M2090 and 2 Tesla M2070 GPUs.

Each Omega node has 8 cores and 30GB of usable memory.

Restrictions

  • Fairshare is a mechanism that allows historical resource utilization information to be incorporated into job feasibility and priority decisions. Users that have historically ulitized large amount of cluster resources will have lower job priority than new or occasional cluster users. On a timescale of days, fairshare will modify job priorities from the queue base priority.
  • Each compute node will run only ONE job at a time (ie. exclusive access). A compute node will NOT run many small jobs from a user. Each small job will be queued for a unique compute node.
  • Users that run many small compute jobs are recommended to use SimpleQueue to efficiently scheduler their small jobs. Otherwise your jobs will spend a long time in the queue waiting to run.
  • Each FAS group may simultaneously use up to 50% compute nodes at any given time.
  • Cluster schedulers calculate processor equivalent values when applying restrictions.

Software

Omega uses the modules for managing software and it's dependencies. See our documentation on modules here.

Compute Hardware

Compute Nodes

(44) chassis' each with (16) ProLiant BL460c G6 blades - 704 nodes
(8) chassis' each with (8) HP ProLIant SL230s blades - 64 nodes
(1) chassis with (1) HP ProLiant SL230s blade and (3) SL250s - all 4 have GPU's

Storage

Omega has 2 types of storage, limited capacity home directory space and larger but still limited high performance storage.

Each type of storage has advantages and disadvantages. For example high performance storage has the advantage of being fast but the disadvantages of being expensive and impractical to backup due to the high rate of change. As such high performance storage is typically used to store data being processed by the cluster but not for long term archival.

Below is a table summarizing the different filesystem types typically available on all other clusters. (Individual cluster configurations may vary)

Filesystem Description Recommended Uses Backup Method
/home User Home directory Storing scripts, files that are not temporary or changing constantly Automatic to tape
/scratch High Performance Filesystem Temporary calculation data, read/write files NOT BACKED UP

Other Storage Options

Other file storage options are offered by Yale ITS to University members. Options address different levels of security, information privacy and shared access levels.