Choosing a Queue
PBS (Portable Batch System) is used as a way to manage jobs that are submitted to the cluster. The utilities you’ll need are installed in /opt/torque/bin/. The PBS system can be used to submit jobs to the cluster from appropriate login node to a specified queue. Queues, their limits and purposes are listed on the individual cluster pages
Interactive jobs can be used for testing and troubleshooting code. By requesting an interactive job, you will get a shell on the requested nodes to yourself.
$ qsub -q <queue> -I
This will assign a free node to you, and put you within a shell on that node. You can run any number of commands within that shell. To free the allocated node, exit from the shell. The environment variable $PBS_NODEFILE is set to the name of a file containing the names of the node(s) you were allocated.
When using an interactive shell under PBS, your job is vulnerable to being killed if you lose your network connection. We recommend that all long-duration interactive PBS shells be run under screen. If you do use screen, please be sure to keep track of your allocations and free those no longer needed!
Submitting a Batch Job
To submit a job via TORQUE, you first write a simple shell script that wraps your job called a "submission script". A submission script is comprised of two parts, the "directives" that tell the scheduler how to setup the computational resources for your job and the actual "script" portion, which are the commands you want executed during your job.
The directives are comprised of "#PBS" followed by a Torque flag. Most commonly used flags (should be in just about every submission script) include:
|-N job_name||Custom job name|
|-l nodes=<N>:ppn=8||Total number of nodes (N)|
|-l walltime=<HH:MM:SS>||Walltime of the job in Hours:Minutes:seconds|
|-l mem=<M>gb||Memory requested for job. A node on Omegas has 35GB of usable memory, so M=35*N|
Special use flags include:
|-m abe -M <email address>||Sends you job status reports when you job starts (b), aborts (a), and/or finishes (e)|
|-o file_name||Specify file name for standard output from job|
|-e file_name||Specify file name for standard error from job|
Additional flags can be found on in the official qsub documentation
Here is an example script.pbs that runs a sequential job:
#PBS -q general #PBS -N my_job #PBS -l nodes=1:ppn=8,mem=35gb #PBS -l walltime=12:00:00 #PBS -m abe -M firstname.lastname@example.org cd $PBS_O_WORKDIR ./myprog arg1 arg2 arg3 ...
This script runs myprog on a node chosen by TORQUE from the general queue, after changing directory to where the user did the submission (the default behavior is to run in the home directory). In this case, the job will be run on a fully allocated node (the node will not be shared with any other users, so, for example, your program will have access to all available memory). You can put any number of PBS directives in the script, followed by commands to be run.
To actually submit the job, do:
$ qsub script.pbs
We recommend that all jobs run on a queue be configured to send email notifications, so that you will know if they are aborted. To do this, use the –m and –M flags:
-m abe -M email@example.com
A Yale email address must be used; non-Yale emails will silently fail.
You can specify all flags either in the script or on the command line, with the command line taking precedence. For example, the previous script could be submitted to a "general" queue, without change, by doing:
$ qsub -q general script.pbs
Monitoring Job Status
To check on the status of your jobs, do:
$ qstat -u YourNetID # first argument is minus one.
To kill a running job, do:
$ qdel <job_id>
To check on the status of a queue (to see how many nodes are free, for example), do:
$ qstat -Q -f <queue>
Output will normally be buffered in an obscure location and then returned to you after the job completes in files called scriptname.ojobid and scriptname.ejobid for standard output and standard error, respectively. It is generally a good idea to explicitly redirect standard output and standard error so you can track the progress of the run by glancing at these files:
$ ./myprog arg1 arg2 arg3 ... > myprog.out 2> myprog.err
Passing Arguments to a Torque Script
There is no way to pass command line arguments to a Torque script, surprising as that seems. However, there is another way to accomplish this using environment variables. The -v flag to qsub causes the specified environment variable to be set in the script’s environment. Here is an example script we will save as “env.pbs”:
#PBS -v SLEEPTIME echo "SLEEPTIME IS $SLEEPTIME" sleep $SLEEPTIME
To run it, first set the env variable in your shell:
$ export SLEEPTIME=10
Then submit the script using qsub:
$ qsub env.pbs
For more documentation on TORQUE: