Submitting Jobs using TORQUE

TORQUE is an open source batch queuing system that is very similar to PBS. Most PBS commands will work without any change. TORQUE is maintained by Adaptive Computing.

Choosing a Queue

PBS (Portable Batch System) is used as a way to manage jobs that are submitted to the cluster. The utilities you’ll need are installed in /opt/torque/bin/. The PBS system can be used to submit jobs to the cluster from appropriate login node to a specified queue. Queues, their limits and purposes are listed on the individual cluster pages

Interactive Jobs

Interactive jobs can be used for testing and troubleshooting code. By requesting an interactive job, you will get a shell on the requested nodes to yourself.

$ qsub -q <queue> -I

This will assign a free node to you, and put you within a shell on that node. You can run any number of commands within that shell. To free the allocated node, exit from the shell. The environment variable $PBS_NODEFILE is set to the name of a file containing the names of the node(s) you were allocated.

When using an interactive shell under PBS, your job is vulnerable to being killed if you lose your network connection. We recommend that all long-duration interactive PBS shells be run under screen. If you do use screen, please be sure to keep track of your allocations and free those no longer needed!

Submitting a Batch Job

To submit a job via TORQUE, you first write a simple shell script that wraps your job called a "submission script". A submission script is comprised of two parts, the "directives" that tell the scheduler how to setup the computational resources for your job and the actual "script" portion, which are the commands you want executed during your job.

The directives are comprised of "#PBS" followed by a Torque flag. Most commonly used flags (should be in just about every submission script) include:

-N job_name Custom job name
-q queue_name Queue
-l nodes=<N>:ppn=8 Total number of nodes (N)
-l walltime=<HH:MM:SS> Walltime of the job in Hours:Minutes:seconds
-l mem=<M>gb Memory requested for job. A node on Omegas has 35GB of usable memory, so M=35*N

Special use flags include:

-m abe -M <email address> Sends you job status reports when you job starts (b), aborts (a), and/or finishes (e)
-o file_name Specify file name for standard output from job
-e file_name Specify file name for standard error from job

Additional flags can be found on in the official qsub documentation

Here is an example script.pbs that runs a sequential job:

#PBS -q general
#PBS -N my_job
#PBS -l nodes=1:ppn=8,mem=35gb
#PBS -l walltime=12:00:00
#PBS -m abe -M me@yale.edu

cd $PBS_O_WORKDIR
./myprog arg1 arg2 arg3 ...

This script runs myprog on a node chosen by TORQUE from the general queue, after changing directory to where the user did the submission (the default behavior is to run in the home directory). In this case, the job will be run on a fully allocated node (the node will not be shared with any other users, so, for example, your program will have access to all available memory). You can put any number of PBS directives in the script, followed by commands to be run.

To actually submit the job, do:

 $ qsub script.pbs

We recommend that all jobs run on a queue be configured to send email notifications, so that you will know if they are aborted. To do this, use the –m and –M flags:

-m abe -M me@yale.edu

A Yale email address must be used; non-Yale emails will silently fail.

You can specify all flags either in the script or on the command line, with the command line taking precedence. For example, the previous script could be submitted to a "general" queue, without change, by doing:

 $ qsub -q general script.pbs

Monitoring Job Status

To check on the status of your jobs, do:

 $ qstat -u YourNetID # first argument is minus one.

To kill a running job, do:

$ qdel <job_id>

To check on the status of a queue (to see how many nodes are free, for example), do:

$ qstat -Q -f <queue>

Output will normally be buffered in an obscure location and then returned to you after the job completes in files called scriptname.ojobid and scriptname.ejobid for standard output and standard error, respectively. It is generally a good idea to explicitly redirect standard output and standard error so you can track the progress of the run by glancing at these files:

$ ./myprog arg1 arg2 arg3 ... > myprog.out 2> myprog.err

Passing Arguments to a Torque Script

There is no way to pass command line arguments to a Torque script, surprising as that seems. However, there is another way to accomplish this using environment variables. The -v flag to qsub causes the specified environment variable to be set in the script’s environment. Here is an example script we will save as “env.pbs”:

#PBS -v SLEEPTIME
echo "SLEEPTIME IS $SLEEPTIME"
sleep $SLEEPTIME

To run it, first set the env variable in your shell:

$ export SLEEPTIME=10

Then submit the script using qsub:

$ qsub env.pbs

Additional Tips

For more documentation on TORQUE: