<
High Throughput Computing using HTCondor

Simplfied Job Submission

A number of tools have been developed locally with the aim of making HTCondor job submission more user-friendly and are described here. One difficulty users face with HTCondor is creating HTCondor submit description files (whose syntax can be fairly obscure) and editing these files using the UNIX system editors (which are not exactly known for their ease of use). These tools do not do away with the need for submit description files altogether but strive to make them easier to create and use.

General Purpose Tools

The mws_submit command can be used instead of condor_submit to submit jobs using simplified submit description files i.e.:

$ mws_submit simplified_description_file
(Where simplified_description_file is the name of the "user-friendly" submit description file).

A typical submit description file might contain:
executable = myapp.exe
input_files = myinput_common, other_input_common
indexed_input_files = input_data, other_input_data
indexed_output_files = output_data
total_jobs = 10
All of the attributes are optional apart from executable which must be specified (the default for total_jobs is a single job). The executable attribute specifies the main executable file to be run on a HTCondor pool PC - this will generally be a .bat file or a .exe file. The input_files attribute lists which input files are common to all jobs whilst indexed_input_files lists input files which are different for each individual job. In this example, each job will get its own input_data file from the set of input files input_data0 ... input_data9 (the same is true for other_input_data).

The indexed_output_files attribute will ensure that the output files are retrieved following the same indexing as the input files (i.e. output_data0 ... output_data9). The internal indexing/unindexing is taken care for you so there is no need to manipulate the index value inside your executable - just use the generic name (e.g. input_data).

All of the values given in the job description may be temporarily overridden from the command line (although the job description file is left unchanged). For example to change the number of submitted jobs from ten to five:
$ mws_submit simplified_description_file --total_jobs=5
and to also use a different executable:
$ mws_submit simplified_description_file --total_jobs=5 --executable=otherapp.exe
This makes it easy to make small changes without the need to edit the job description file. To see all of the options just use the -h option with the simplified submit description tool e.g.
$ mws_submit -h

The mws_submit command creates the job description file used by HTCondor which will have the same name as the simplified job description file but with a .sub extension. The HTCondor job description file corresponding to the above example is:

universe = vanilla
executable = myapp.exe 
transfer_input_files = myinput_common, other_input_common, input_data$(PROCESS), other_input_data$(PROCESS)
transfer_output_files = output_data$(PROCESS)
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
requirements = ( Arch=="X86_64") && ( OpSys=="WINDOWS" )
notification = never
queue 10
Clearly this is a good deal more complicated. Several other attributes can be specified in the simplified job description and all of these are detailed in a later section.


Tools for Submitting MATLAB Jobs

Running MATLAB jobs on the HTCondor pool is made more difficult by the need to build standalone executables to be run as jobs. The tools described below are designed to assist in testing, building and running MATLAB applications.

Since the HTCondor server has MATLAB installed, it is possible test out M-files on it. The command matlab_run can be used to pass the M-file to the MATLAB interpreter without the need for the graphical interface to be started (it is therefore suitable for use with PuTTy or other terminal emulators) for example:
$ matlab_run product.m
Here product.m would need to contain a MATLAB function called product and be able to run without input from the user. MATLAB on the server should be used sparingly and not for M-files which are likely to require significant CPU use over long periods as this can impact badly on the performance of HTCondor.

In the past it was not possible to run M-files directly on the HTCondor PCs however this can now be achieved by using a special job description file e.g.
M_file = product.m
indexed_input_files = input.mat
indexed_output_files = output.mat
total_jobs = 10
This can be submitted to HTCondor using the command m_file_submit e.g.
$ m_file_submit product
(Where product is the name of the job description file. Note that the total number of jobs is limited to 10 to avoid taking up too many licenses).

The command will return with job ID of the M-file job and on completion the output files output*.mat will have been created.

Once the M-file is found to work properly it should be compiled into a standalone executable using the current MATLAB version on a Windows PC - see the MATLAB applications page for more details.

Note that if the M-file contains any syntax errors, the MATLAB compiler will not catch these and will blindly compile the code into an executable which will fail when run under HTCondor. It is extremely difficult to locate these errors later on so please always check that the M-file works correctly before compiling it.

MATLAB standalone applications can be submitted to the HTCondor pool using a simplified submit description file and the command matlab_submit e.g.

$ matlab_submit simplified_job_description_file
The job description file can make use of same attributes as mws_submit for example:
indexed_input_files = input.mat
indexed_output_files = output.mat
executable = product.exe
indexed_log = logfile
total_jobs = 10

The actual submit description file passed to HTCondor in this example is:

universe = vanilla
should_transfer_files = YES
when_to_transfer_output=ON_EXIT
executable = standalone.bat
arguments = product.exe $(PROCESS)  input.mat output.mat
transfer_input_files = product.exe,input$(PROCESS).mat,/opt1/condor/apps/matlab/index.exe,/opt1/condor/apps/matlab/unindex.exe
transfer_output_files = output$(PROCESS).mat
log = logfile$(PROCESS)
request_cpus = 1
requirements = ( Arch=="X86_64") && ( OpSys=="WINDOWS" )
notification = never
queue 10
which again is a good deal more complicated than the simplified job description file.


Summary of Job Description Attributes

A complete list of attributes which can be used in simplified job description files for use with mws_submit and matlab_submit is given below. For attributes with multiple values, a comma separated list is used which may contain spaces, however spaces may not be used on the command line. For example

$ mws_submit -indexed_input_files=input1,input2
will work but
$ mws_submit -indexed_input_files=input1, input2
will not. For any of the simplified submit description tools, use the -h option to get a complete list of options. Attributes can be specified in a submit description file and/or on the command line with the latter taking precedence.
indexed_input_files
A comma-separated list of file names for input files unique to each job. If, for example, input.mat is given as an indexed input file, this would correspond to the set of files input0.mat .. input(n-1).mat with the ith job receiving inputi.mat as an input file.
indexed_log
Similar to the log attribute (below) but different log files are used for each job making it easier to track down information.
indexed_output_files
A comma-separated list of file names for output files unique to each job. If, for example, output.mat is given as an indexed output file, this would correspond to the set of files output0.mat .. output(n-1).mat with the ith job producing outputi.mat as an output file.
indexed_stdout
File to which each individual job's standard output is to be redirected. The file names will be indexed in a similar manner to the indexed input/output files so that the standard output of each individual job can be seen. This can sometimes be useful in determining where things have gone wrong.
indexed_stderr
File to which each individual job's standard error stream is to be redirected. The file names will be indexed in a similar manner to the indexed input/output files so that the standard error of each individual job can be seen. This can sometimes be useful in determining where things have gone wrong.
input_files
Comma-separated list of input files common to all jobs
log
File to which HTCondor logs information about the progress of jobs. For multiple jobs, all of the log information is merged into one file and a better choice may be to use indexed_log. This can be useful in determining where and for how long jobs ran.
runtime
The maximum time (in minutes) that a job will be allowed to run for. After this time has elapsed, the job will be held then released causing it to go back into the HTCondor queue. This is useful to prevent jobs getting "stuck".
memory
This attribute can be used so ensure that jobs run only on machines with at least a given amount of memory. The memory size is specified in GB so that memory = 1 would ensure that jobs run only on PCs with at least 1 GB of memory in total (not per core).
stdout
File to which the job's standard output is to be redirected. This is only really useful for single jobs - for multiple parallel jobs use indexed_stdout
stderr
File to which the job's standard error stream is to be redirected. This is only really useful for single jobs - for multiple parallel jobs use indexed_stdout

Summary of Commands

matlab_run M-file
Will run a M-file using MATLAB on the HTCondor server without the need to start the graphical interface.
matlab_submit simplified_submit_description_file
Submits a standalone MATLAB executable to the pool. The executable does not need to manipulate the input and output filenames to give the correct indexes.
m_file_submit simplified_submit_description file
Uses a HTCondor pool PC to run the specified M-file. A submit description file needs to be supplied which contains (at a minimum) the name of the M-file and the input file it reads.
mws_submit simplified_submit_description_file
Submits a generic HTCondor job to the pool using a simplified submit description file. Users' applications must ensure that all file indexing is taken care of.