Running Job Arrays on barkla
1. Introduction
2. Simplified Job Submission Files
3. Job Submission Attributes for all Types of Jobs
4. Command Line Options
5. MATLAB Applications
6. R Applications
7. Generic Applications
1. Introduction
Although the barkla cluster is primarily intended for use with parallel (e.g. MPI) applications it can also be used for submitting batches of multiple serial jobs which differ only in their input files - these are termed job arrays. Array jobs are useful in applications such as parameter space explorations, Monte Carlo analysis and statistical modelling where the same processing is applied to different input data (in the case of Monte Carlo methods, individual jobs may only differ in their random number generator seeds).
In each case the user will need to store the input data for each job in a different input file in such a way that the each job can select its correct input file. The most convenient way of doing this is to number the input files for example:
input0 input1 input2 ... input<N-1>
where there are N jobs. We'll call these the indexed input files here and the integers [0..N-1] are the index values. Although UNIX does not support file extensions in the same sense as Windows, the index value can also be inserted between the file's "basename" and "extension" e.g.
input0.txt input1.txt input2.txt ... input<N-1>.txt
Corresponding to the indexed input files, there will also be a collection of numbered output files produced by the jobs which we will call indexed output files e.g.
output0 output1 output2 ... output<N-1>or
output0.txt output1.txt output2.txt ... output<N-1>.txt
Where the jth output file corresponds to jth input file viz: inputj -> outputj, inputj.txt -> outputj.txt. In some applications there may be also be input data which is common to all jobs which can be stored in common input files.
Strictly speaking, each individual job in an array job is referred to as a job task in the SLURM scheduler and the term job used for the entire array. This is an example of an array job consisting of 10 tasks which is waiting to run on barkla:
$ squeue -u smithic JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 3326_[0-9] nodes r_app.sh smithic PD 0:00 1 (Priority)
The job-ID in this case is 3326 and the job tasks are numbered with indices [0-9]. Once the individual tasks start to run you can see the individual task-IDs e.g.:
$ squeue -u smithic JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 3326_0 cooper r_app.sh smithic R 0:01 1 node003 3326_1 cooper r_app.sh smithic R 0:01 1 node003 3326_2 cooper r_app.sh smithic R 0:01 1 node003 3326_3 cooper r_app.sh smithic R 0:01 1 node003 3326_4 cooper r_app.sh smithic R 0:01 1 node003 3326_5 cooper r_app.sh smithic R 0:01 1 node003 3326_6 cooper r_app.sh smithic R 0:01 1 node003 3326_7 cooper r_app.sh smithic R 0:01 1 node003 3326_8 cooper r_app.sh smithic R 0:01 1 node003 3326_9 cooper r_app.sh smithic R 0:01 1 node003
Users can prepare job arrays themselves and submit them using the standard SLURM commands however a number of tools have been developed to make the process easier. These tools ensure that all of the input and output file "indexing" is done "behind the scenes" so that users to not have to change their executable code/scripts.
To make this a bit clearer, imagine that your code loads data from input.txt, processes it and writes the output to results.txt e.g.
input_data = load(input.txt) .... .... process input_data to give output_data .... save(results.txt, output_data)
With the job submission tools, you can keep exactly same code and just specify the indexed input files as input.txt and the indexed output files as results.txt. Obviously you will need to create N indexed input files numbered [0..N-1] but everything else is done for you. The tools can be used to submit MATLAB job arrays, job arrays for R applications and other "generic" applications where you have you have your own or third party executables.
2. Simplified Job Submission Files
Key to the job array tools are something we will call job submission files which contain information on input and output files as well as things like job run times, memory requirements etc. Each file contains a number of attribute=value pairs, one pair per line. Blank spaces and blank lines are ignored and comments can be added by prefixing the text with a hash '#' character e.g.
# a comment - blanks are OK anywhere input_files = common1,subfunc.m indexed_input_files = iia.txt,iib indexed_output_files = outa.txt,outb.txt # blanks lines are OK cores_per_job = 2 runtime = 1h # one hour - end of line comment memory_gb = 8 M_file = hello.m total_jobs = 2
Some attributes are specific to certain applications (for example the M_file attribute used here is for MATLAB) but the following are common to all applications.
3. Job Submission Attributes for all Types of Jobs
indexed_input_files
A single filename or list of filenames for input files that are different for each job task. For example, for a single batch of input files you would use something like:
indexed_input_files = input.txt
and this will refer to N input files input0.txt, input1.txt ... input<N-1>.txt Multiple filenames should be separated by commas e.g.
indexed_input_files = input_a.txt,input_b
This would correspond to two sets of input files: input_a0.txt, input_a1.txt ... input_a<N-1>.txt and input_b0, input_b1 ... input_b<N-1>. Your code can just read the input files without modification e.g.
input_a_data = load(input_a.txt) input_b_data = load(input_b)
indexed_output_files
A single filename or list of filenames for output files that will be different for each job task. For example, for a single batch of input files you would use something like:
indexed_output_files = output.txt
and this will refer to N input files output0.txt, output1.txt ... output<N-1>.txt. Multiple filenames should be separated by commas e.g.
indexed_input_files = output_a.txt,output_b
This would correspond to two sets of output files: output_a0.txt, output_a1.txt ... output_a<N-1>.txt and output_b0, output_b1 ... output_b<N-1>. Your code can just write the input files without modification e.g.
save(input_a.txt, output_a_data) save(output_b, output_b_data)
common_input_files
A single filename or list of filenames for input files that are the same for each job task. Multiple filenames should be separated by commas e.g.
common_input_files = common_data_a.txt, common_data_b.txt
cores_per_job
The number of cores allocated to each job task (default 1). Using the option can useful if your application uses multi-threading to speed up execution. In this case, set cores_per_job to be the same as the (maximum) number of threads your application employs. If the number of threads exceeds the values of cores_per_job, the performance of your jobs may suffer as may those of other users who happen to be running on the same node. On barkla each of the "ordinary" compute nodes have 40 cores.
memory_gb
The amount of memory in GB allocated to each job task. If your application is particularly memory hungry then it is important to set this value to the maximum amount of memory used by your code as SLURM will terminate any jobs that exceed the default memory limits (~ 9.6 GB/core => ~ 380 GB per node on the ordinary compute nodes).
runtime
Maximum time the job tasks will run for expressed in hours (e.g. 48h) or days (e.g. 2d). Although this attribute is optional, it should be used for relatively short jobs as SLURM will prioritise these over longer running jobs (other things beings equal) so you jobs will spend less time queueing. However this will only happen if you specify the runtime explictly. Note that time limits are enforced and jobs will be terminated if they exceed them so it is best to err on the side of caution (or at least start with a long runtime and work downwards).
stdout
Specifies a file where the combined standard output and standard error for all job tasks are directed to. Note SLURM, unlike other schedulers, merges the standard and output and error streams together by default.
indexed_stdout
Specifies "indexed" output files for standard output and error similar to indexed_output_files. This allows the merged standard output/error for each individual job task to be written to a separate file.
total_jobs
Total number of job tasks to be run (must match the correct number of indexed input files).
scratch
Can be used to specify the temporary storage area where the job tasks will run. This may well speed up execution as local storage is considerably faster than the home filestore where job files are usually stored long term. The following storage areas can be specified:
name | location | size | speed |
---|---|---|---|
localscratch | ~/localscratch | 750 GB | fastest |
sharedscratch | ~/sharedscratch | 347 TB | fast |
volatile | ~/volatile | 100 TB | slowest |
The default is localscratch and the value none can also be used to indicate that jobs should be run on the same filesystem that they were submitted from (not usually a good idea if this is your home filestore).
4. Command Line Options
Although there are slight differences between the job submission tools for different applications, they all have the same command line format namely:
$ command_name [options] job_submission_file
This will create a job script file which is the one actually passed to the SLURM scheduler. It may be worth taking a look inside this and possibly using it as template for your own applications. The job script filename will be the the same as the job submission file with a .sh "extension". To get a list of the options available, use -h option:
$ command_name -h
for example:
$ array_submit -h Usage: array_submit [options] job_submission_file Options: -h, --help show this help message and exit -c INTEGER, --cores_per_job=INTEGER number of cores to run each job task on -m MEMORY, --memory_gb=MEMORY amount of memory to allocate to each job task in GB -r TIME, --runtime=TIME maximum runtime in hours (e.g. 48h) or days (e.g. 2d) -e FILE, --executable=FILE executable to run -b FILE, --script=FILE (bash) script to run -f FILE(s), --input_files=FILE(s) common input file (or files as comma-separated list e.g.: file1,file2,file3) -i FILE(s), --indexed_input_files=FILE(s) indexed input file (or files as comma-separated list e.g.: file1,file2,file3) -p FILE(s), --indexed_output_files=FILE(s) indexed output file (or files as comma-separated list e.g.: file1,file2,file3) -o FILENAME, --stdout=FILENAME merged standard output and error from job tasks -s FILENAME, --indexed_stdout=FILENAME indexed merged standard output and error from job tasks -t INTEGER, --total_jobs=INTEGER number of job tasks to run -a DIRECTORY, --scratch=DIRECTORY scratch storage to run jobs in: localscratch|sharedscratch|volatile|none (default localscratch) All [options] are...optional ! Command line options take precedence over job submission file attributes.
As stated above, the command line option will overrule any attribute values set in the job submission file. This is useful for making small changes e.g. to the runtime or memory values. The job submission file is mandatory but can be blank if all the options are specified on the command line.
5. MATLAB Applications
Array jobs which make use of MATLAB scripts (M-files) can easily be submitted using the matlab_submit command. To get a complete list of command line options (or the equivalent job submssion file attributes) use
$ matlab_submit -h
The M_file option/attribute (note the underscore) is used to specify the main M-file to be run and any other M-files needed should be given as common input files using the input_files option/attribute.
An example job submission file for a MATLAB application is:
$ cat matlab_example.sub M_file = main_script.m indexed_input_files = input.mat input_files = subfunc1.m, subfunc2.m indexed_output_files = output.mat cores_per_job = 4 runtime = 1h total_jobs = 10
Here main_script.m contains the main MATLAB function and calls functions in subfunc1.m and subfunc2.m. The input data is read from the files input*.mat and the results written to output*.mat. Ten job tasks will be created which will each run on 4 cores with a maximum run time of 1 hour. This would be submitted using:
matlab_submit matlab_example.sub
It is also possible to submit jobs based on pre-built MATLAB standalone executables using the executable option/attribute. To create a MATLAB standalone executable, use the command
$ matlab_build M_file
Where M_file is the main M-file to be used. This will build the exectuable on a compute node as a SLURM job. The executable will have the same name as the M-file minus any "extension" (e.g. .m). Where multiple M-files containing functions called by the main M-file are used, place these in a directory called dependencies below the current working directory. Pre-compiled executables may perform better than M-files for some codes however the author has not seen any noticable speed up on his codes.
6. R Applications
Array jobs which use the R statistics language can be submitted using the r_submit command. To get a complete list of command line options (or the equivalent job submssion file attributes) use:
$ r_submit -h
The R_script option/attribute is used to specify the main R script to be run and any other R scripts that are needed should be given as common input files using the input_files option/attribute.
An example job submission file for a R application is:
$ cat r_example.sub R_script = main_script.R indexed_input_files = input.RData input_files = subfunc1.R, subfunc2.R indexed_output_files = output.RData memory_gb = 64 runtime = 1d total_jobs = 20
Here main_script.m contains the main R code and calls functions in subfunc1.R and subfunc2.R. The input data is read from the files input*.RData and the results written to output*.RData. Twenty job tasks will be created which will be allocated 64 GB of memory each with a maximum run time of 1 day. This would be submitted using:
$ r_submit matlab_example.sub
7. Generic Applications
If you have an executable built perhaps from your own source code this can be used in a job array by using the array_submit command. To get a complete list of command line options (or the equivalent job submssion file attributes) use:
$ array_submit -h
The executable option can be used to specify which binary executable to run (which can include a pathname if necessary). Shell scripts can also be run in this way. The script option can be used to specify a script which will be included in the job script submitted to SLURM (the executable option on the other hand just runs the shell script without including it).
An example job submission file for use with your own executable code is:
$ cat array_example.sub executable = my_application indexed_input_files = input.txt indexed_output_files = output.txt cores_per_job = 4 memory_gb = 32 runtime = 36h total_jobs = 5
Here my_application contains the binary executable for your own (presumably multi-threaded) application. The input data will be read from input*.txt and the results written to output*.txt. Five job tasks will be created which will be allocated 32 GB of memory and four cores each with a maximum run time of 36 hours. This would be submitted using:
$ array_submit array_example.sub