<
High Throughput Computing using Condor

Simplfied Job Submission

A number of tools have been developed locally with the aim of making Condor job submission more user-friendly and are described here. One difficulty users face with Condor is creating Condor job submission files (whose syntax can be fairly obscure) and editing these files using the UNIX system editors (which are not exactly known for their ease of use). These tools do not do away with the need for job submission files altogether but strive to make them easier to create and use.

General Purpose Tools

The mws_submit command can be used instead of condor_submit to submit jobs using simplified job submission files i.e.:

$ mws_submit simplified_description_file
(Where simplified_description_file is the name of the "user-friendly" job submission file).

A typical job submission file might contain:
executable = myapp.exe
input_files = myinput_common, other_input_common
indexed_input_files = input_data, other_input_data
indexed_output_files = output_data
total_jobs = 10
All of the attributes are optional apart from executable which must be specified (the default for total_jobs is a single job). The executable attribute specifies the main executable file to be run on a Condor pool PC - this will generally be a .bat file or a .exe file. The input_files attribute lists which input files are common to all jobs whilst indexed_input_files lists input files which are different for each individual job. In this example, each job will get its own input_data file from the set of input files input_data0 ... input_data9 (the same is true for other_input_data).

The indexed_output_files attribute will ensure that the output files are retrieved following the same indexing as the input files (i.e. output_data0 ... output_data9). The internal indexing/unindexing is taken care for you so there is no need to manipulate the index value inside your executable - just use the generic name (e.g. input_data).

All of the values given in the job description may be temporarily overridden from the command line (although the job description file is left unchanged). For example to change the number of submitted jobs from ten to five:
$ mws_submit simplified_description_file --total_jobs=5
and to also use a different executable:
$ mws_submit simplified_description_file --total_jobs=5 --executable=otherapp.exe
This makes it easy to make small changes without the need to edit the job description file. To see all of the options just use the -h option with the simplified job submission tool e.g.
$ mws_submit simplified_description_file -h

The mws_submit command creates the job description file used by Condor which will have the same name as the simplified job description file but with a .sub extension. The Condor job description file corresponding to the above example is:

universe = vanilla
executable = myapp.exe 
transfer_input_files = myinput_common, other_input_common, input_data$(PROCESS), other_input_data$(PROCESS)
transfer_output_files = output_data$(PROCESS)
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
requirements = ( Arch=="X86_64") && ( OpSys=="WINDOWS" )
notification = never
queue 10
Clearly this is a good deal more complicated. Several other attributes can be specified in the simplified job description and all of these are detailed in a later section.


Tools for Submitting MATLAB Jobs

Running MATLAB jobs on the Condor pool is made more difficult by the need to build standalone executables to be run as jobs. The tools described below are designed to assist in testing, building and running MATLAB applications.

Since the Condor server has MATLAB installed, it is possible test out M-files on it. The command matlab_run can be used to pass the M-file to the MATLAB interpreter without the need for the graphical interface to be started (it is therefore suitable for use with PuTTy or other terminal emulators) for example:
$ matlab_run product.m
Here product.m would need to contain a MATLAB function called product and be able to run without input from the user. MATLAB on the server should be used sparingly and not for M-files which are likely to require significant CPU use over long periods as this can impact badly on the performance of Condor.

In the past it was not possible to run M-files directly on the Condor PCs however this can now be achieved by using a special job description file e.g.
M_file = product.m
indexed_input_files = input.mat
indexed_output_files = output.mat
total_jobs = 10
This can be submitted to Condor using the command m_file_submit e.g.
$ m_file_submit product
(Where product is the name of the job description file. Note that the total number of jobs is limited to 10 to avoid taking up too many licenses).

The command will return with job ID of the M-file job and on completion the output files output*.mat will have been created.

Once the M-file is found to work properly, it is possible to build the standalone application directly on a pool PC without the need to build it locally and then upload it. This accomplished by using the command matlab_build e.g.

$ matlab_build product.m
The command will return a job ID and on completion, the standalone executable product.exe should have been created. The file build.log can be used to track the progress of the job which should only take a few minutes to run.

Note that if the M-file contains any syntax errors, the Matlab compiler will not catch these and will blindly compile the code into an executable which will fail when run under Condor. It is extremely difficult to locate these errors later on so please always check that the M-file works correctly before compiling it.

MATLAB standalone applications can be submitted to the Condor pool using a simplified job submission file and the command matlab_submit e.g.

$ matlab_submit simplified_job_description_file
The job description file can make use of same attributes as mws_submit for example:
indexed_input_files = input.mat
indexed_output_files = output.mat
executable = product.exe
indexed_log = logfile
total_jobs = 10

Another important feature is that matlab_submit will automatically create a manifest file so that MATLAB can locate the required run time libraries. There is therefore no need for the user to worry about this. In this example the manifest file would be product.exe.manifest. The actual job submission file passed to Condor in this example is:

universe = vanilla
executable = product.bat 
arguments = product.exe $(PROCESS) input output
transfer_input_files = product.exe.manifest, product.exe, input$(PROCESS).mat
transfer_output_files = output$(PROCESS).mat
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
requirements = ( Arch=="X86_64") && ( OpSys=="WINDOWS" )
notification = never
queue 10
which again is a good deal more complicated than the simplified job description file.


Summary of Job Description Attributes

A complete list of attributes which can be used in simplified job description files for use with mws_submit and matlab_submit is given below. For attributes with multiple values, a comma separated list is used which may contain spaces, however spaces may not be used on the command line. For example

$ mws_submit -indexed_input_files=input1,input2
will work but
$ mws_submit -indexed_input_files=input1, input2
will not. For any of the simplified job submission tools, use the -h option to get a complete list of options. Attributes can be specified in a job submission file and/or on the command line with the latter taking precedence.
indexed_input_files
A comma-separated list of file names for input files unique to each job. If, for example, input.mat is given as an indexed input file, this would correspond to the set of files input0.mat .. input(n-1).mat with the ith job receiving inputi.mat as an input file.
indexed_log
Similar to the log attribute (below) but different log files are used for each job making it easier to track down information.
indexed_output_files
A comma-separated list of file names for output files unique to each job. If, for example, output.mat is given as an indexed output file, this would correspond to the set of files output0.mat .. output(n-1).mat with the ith job producing outputi.mat as an output file.
indexed_stdout
File to which each individual job's standard output is to be redirected. The file names will be indexed in a similar manner to the indexed input/output files so that the standard output of each individual job can be seen. This can sometimes be useful in determining where things have gone wrong.
indexed_stderr
File to which each individual job's standard error stream is to be redirected. The file names will be indexed in a similar manner to the indexed input/output files so that the standard error of each individual job can be seen. This can sometimes be useful in determining where things have gone wrong.
input_files
Comma-separated list of input files common to all jobs
log
File to which Condor logs information about the progress of jobs. For multiple jobs, all of the log information is merged into one file and a better choice may be to use indexed_log. This can be useful in determining where and for how long jobs ran.
max_run_time
The maximum time (in minutes) that a job will be allowed to run for. After this time has elapsed, the job will be held then released causing it to go back into the Condor queue. This is useful to prevent jobs getting "stuck".
memory
This attribute can be used so ensure that jobs run only on machines with at least a given amount of memory. The memory size is specified in GB so that memory = 1 would ensure that jobs run only on PCs with at least 1 GB of memory in total (not per core).
stdout
File to which the job's standard output is to be redirected. This is only really useful for single jobs - for multiple parallel jobs use indexed_stdout
stderr
File to which the job's standard error stream is to be redirected. This is only really useful for single jobs - for multiple parallel jobs use indexed_stdout

Summary of Commands

matlab_build M-file
Builds a standalone executable by compiling the M-file on a PC in the Condor pool.
matlab_run M-file
Will run a M-file using MATLAB on the Condor server without the need to start the graphical interface.
matlab_submit simplified_job_description_file
Submits a standalone MATLAB executable to the pool. The executable does not need to manipulate the input and output filenames to give the correct indexes.
m_file_submit simplified_job_description file
Uses a Condor pool PC to run the specified M-file. A job description needs to be supplied which contains (at a minimum) the name of the M-file and the input file it reads.
mws_submit simplified_job_description_file
Submits a generic Condor job to the pool using a simplified job description. Users' applications must ensure that all file indexing is taken care of.