How can I tell how much memory my processes are using?

Running Jobs

The easiest way to check this is to get another shell on the same compute node (either by ssh’ing into it, or if it is an interactive qsub, by putting your application, e.g. R, into the background with CTRL-z) and then running either ps or top.

$ ps auxww | grep 
$ top

The number you are interested in using top is RES. In this case below, the YEPNEE.exe programs are each consuming ~600mb of memory.

The memory available on each compute node varies; you can run “free -m” check the total available physical memory. In the example below, this compute node has 36gb of physical memory:

$ free -m
             total       used       free     shared    buffers     cached
Mem:         36134       4396      31738          0        293       2473
-/+ buffers/cache:       1629      34505
Swap:        16383         38      16345

Alternatively, you could also run the following:

$ cat /proc/meminfo |grep MemTotal
MemTotal:       37002052 kB

If you launch your program using “/usr/bin/time”, it will provide statistics about the resources used by the job. For example:

$ /usr/bin/time -v echo "test"
	Command being timed: "echo test"
	User time (seconds): 0.00
	System time (seconds): 0.00
	Percent of CPU this job got: 0%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.01
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 2400
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 196
	Voluntary context switches: 1
	Involuntary context switches: 1
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

Completed Jobs

Slurm keeps track of the memory footprint of a job.  After the job completes, you can run sacct to get that info.  Unfortunately, the default output from sacct is not very useful, and sacct -l is very verbose.  We recommend setting this environment variable to customize the output:

export SACCT_FORMAT=”JobID%-20,JobName,User,Partition,NodeList,Elapsed,State,ExitCode,MaxRSS, AllocTRES%32”

[rdb9@farnam2 slurm]$ sacct -j 3483427
               JobID    JobName      User  Partition        NodeList    Elapsed      State ExitCode     MaxRSS                        AllocTRES
——————– ———- ——— ———- ————— ———- ———- ——– ———- ——————————– 
3477706_5125           vor_HQ_1      zm73    general          c18n02   00:15:12  COMPLETED      0:0                      cpu=1,mem=6400M,node=1 
3477706_5125.batch        batch                               c18n02   00:15:12  COMPLETED      0:0      5508K           cpu=1,mem=6400M,node=1 
3477706_5125.extern      extern                               c18n02   00:15:12  COMPLETED      0:0       824K           cpu=1,mem=6400M,node=1 

You should look at the MaxRSS value.