Submitting Jobs with LSF

Basic LSF Commands

Submit a submission script for a batch job (see below for details)

$ bsub  < submit.sh

To list your jobs:

$ bjobs

To get detailed information on a single job or determine why a job isn't running:

$ bjobs -l 2925

To kill a specific job:

$ bkill 2925

To kill all of your jobs:

$ bkill 0

Submitting interactive jobs

Interactive jobs can be used for testing and troubleshooting code. By requesting an interactive job, you will get a command prompt on the requested cores to yourself.

To run an interactive shell session on a compute node use the -Is flag:

$ bsub -Is -q interactive bash

To run an interactive session that can open windows on your X11 server, add the -XF flag:

$ bsub -Is -XF -q interactive  bash

Note that for X11 forwarding to work, you need to have your local machine setup properly. Instructions for this setup can be found in each of the Connection Guides under "How to Get a Graphical Interface": SSH for Microsoft Windows, SSH on macOS and Linux.

By default, an interactive job requests a single core for 1 hour. See below for flags for requesting more cores or longer walltime. To request more than 20 cores or 4 hours, submit your job request to the appropriate queue instead of the interactive queue. Queues, their limits and purposes are listed on the individual cluster pages.

Submitting batch jobs

Jobs can also be submitted to the scheduler for unattended runs. To do this, you construct a "submission script" and then submit that script to the scheduler using the bsub command.

A submission script is comprised of two parts, the "directives" that tell the scheduler how to setup the computational resources for your job and the actual "script" portion, which are the commands you want executed during your job.

The directives are comprised of "#BSUB" followed by an LSF flag. Most commonly used flags (should be in just about every submission script) include:

-n N Total number of cores (N)
-J job_name Custom job name
-q queue_name Queue
-W HH:MM walltime of the job in Hours:Minutes

Special use flags include:

-R "span[hosts=N]" Restrict number of nodes to run on
-R "span[ptile=N]" Specify core layout, such that N cores are assigned per node
-o file_name Specify file name for standard output from job
-e file_name Specify file name for standard error from job
-B Sends you an email alert when your job begins
-N Sends a job report to you via email when you job completes (default behavior when -o is not specified)

Queues, their limits and purposes are listed on the individual cluster pages. Note: all these flags can be used with interactive jobs as well by inserting them between "bsub" and "bash" in your interactive job submission command.

Here is a script named "mparr.sh" that executes R in parallel on multiple nodes. It requests 80 cores across for 4 hours on the shared queue. :

#BSUB -J example_job_1
#BSUB -n 40
#BSUB -q shared
#BSUB -W 4:00


module load Apps/R/3.0.3
module load Rpkgs/DOMPI
mpirun R --slave -f mparr.R

To submit "mparr.sh":

$ bsub  < mparr.sh

Here is another more complicated example script named "parr.sh" that executes R in parallel on a single node with 10 cores.

#BSUB -J example_job_2
#BSUB -n 10
#BSUB -R "span[hosts=1]"  # this script must run on only one node
#BSUB -q shared
#BSUB -W 4:00
#BSUB -o outputs/example_job_1.%J
#BSUB -N

echo "Slot count: $LSB_DJOB_NUMPROC"
module load Apps/R/3.0.3
module load Rpkgs/DOPARALLEL
R --slave -f parr.R

To submit "parr.sh", use the bsub command:

$ bsub < parr.sh

Redirecting Output from a Job

By default, LSF will send you a job report via email when your job is complete and this email will also contain output (up to a limit) from your job. To redirect that output to files, use the -o and -e directives in your submission script. However, when you add these flags, it disables the automated email. To continue to receive an email when you job completes, add the -N flag as well. Example:

#BSUB -o outputs/example_job_1.o%J
#BSUB -e outputs/example_job_1.e%J
#BSUB -N

Here %J is a variable that will be substituted at runtime with the job id. This is very useful if you run the job more than once, because you will have a unique file for every submission. If you only specify the -o flag, both stout and sterr will be directed to that file.

Note: If an existing file is specified by “-o” or “-e”, output will be appended to that file. There are  “-oo” and “-eo” options that will force overwrite existing files.

Requesting Memory

Requesting memory for a job in LSF is a bit subtle. By default, we assign a job 5GB of memory per process (core) requested. To alter this default, the "bsub -M P" option specifies the memory limit, P, for each process. However, there is also a '-R "rusage[mem=N]"' option that specifies the memory, N, to reserve for this job on each node. Memory with -M and '-R "rusage[mem=N]"' is requested in units of MB, where 1024 MB equals 1 GB.

Requesting Specific Architecture

On Grace we have 2 different versions of the Intel Xeon compute nodes, Ivy Bridge (Intel Xeon E5-2660 V2) and Haswell (ntel Xeon E5-2660 V3). By default, your job will be assigned one of the two sets based on availability. If you would like to ensure you are running on a specific architecture add the following flag to your submission command or as a directive to your script.

-m <node_type>

replacing <node_type> with nx360m4 or nx360m5 for the Ivy Bridge or Haswell nodes, respectively.

Modifying a Submitted Job

Sometimes you make a mistake in submitting a job, e.g. requesting the wrong queue, walltime or number of cores. Submitted jobs can be edited using the bmod command followed by a LSF flag and the job id (2925 in these examples).

To change the requested number of slots (before the job starts):

$ bmod -n 20 2925

To remove the requested "span" options (before the job starts):

$ bmod -R "span[]" 2925

To change the requested wall time limit (after the job starts):

$ bmod -W 24:00 2925

Troubleshooting Pending jobs

To get detailed information on a job and determine why it isn't running:

$ bjobs -l 2925

For example, if you try to start more than 20 processes on one node, the job status will be set to "PEND" since there are no nodes on Grace that have more than 20 cores. You'll need to cancel the job using the "bkill" command, or fix it using "bmod":

$ bsub -n 40 -R span[hosts=1] < parr.sh
Job <2873> is submitted to default queue .
$ bjobs
JOBID  USER   STAT  QUEUE   FROM_HOST  EXEC_HOST  JOB_NAME   SUBMIT_TIME
2873   sw464  PEND  shared  grace0                parr      May 21 10:15

$ bmod -n 20 2873
Parameters of job <2873> are being changed

Other useful commands

To run Matlab desktop interactively on a compute node:

$ module load Apps/Matlab/R2013b
$ bsub -XF matlab -desktop

To run Matlab interactively on one compute node with 20 cores allocated for eight hours:

$ module load Apps/Matlab/R2013b
$ bsub -XF -n 20 -R "span[hosts=1]" -W 8:00 matlab -desktop

To run Stata interactively on one compute node with 20 cores allocated for eight hours:

$ module load Apps/Stata/13.1
$ bsub -XF -n 20 -R "span[hosts=1]" -W 8:00 xstata-mp

General Notes on LSF

The environment of a job is initialized differently in LSF than in Torque/Moab. In particular:

  • The job inherits its environment from the submission environment.
  • The user’s shell init scripts are not used by default for non-interactive jobs.

If you want to initialize the job environment using your ~/.bashrc file:

  • use the bsub “-L /bin/bash” option
  • put “source ~/.bashrc” in your job script

The bsub command can run arbitrary commands with optional arguments. This is particularly useful for executing applications with graphical interfaces, such as Matlab and Mathematica. Commands must be either be in your PATH or specified with an absolute or relative path.

LSF uses the "#BSUB" directive to allow bsub options to be embedded in job scripts in the same way that Torque uses "#PBS". Note that "#BSUB" directives are ignored in shell scripts that are specified as commands. "#BSUB" directives only work in job scripts specified via input redirection.

Jobs submitted with bsub inherit the submitter's current working directory, so it isn't necessary to use a command such as

$ cd $PBS_O_WORKDIR

in LSF job scripts. This is particularly useful when submitting binary executables via bsub.