<
High Throughput Computing using Condor

# Running Fortran Applications under Condor

All of the files for this example can be found on condor.liv.ac.uk in /opt1/condor/examples/fortran.

## Contents

Introduction
Creating the Fortran 90 files
Compilation and initialisation
Creating the simplified job submission file
Running the Condor jobs
Combining the results
Discussion

## Introduction

Condor is well suited to running large numbers of Fortran jobs concurrently. If the application applies the same kind of analysis to large data sets (so-called "embarrassingly parallel" applications) or carries out similar calculations based on different random initial data (e.g. applications based on Monte Carlo methods), Condor can significantly reduce the time needed to generate the results by processing the data in parallel on different hosts. In some cases, simulations and analyses that would have taken years on a single PC can be completed in a matter of days.

The application will need to perform three main steps:

1. Create the initial input data and store it into files which can be processed in parallel.
2. Process the input data in parallel using the Condor pool and write the outputs to corresponding files.
3. Combine the data in the output files to generate the overall results.

As a very simple example we are going to use Monte Carlo analysis to estimate the value of pi. This is a slightly artificial example however, Monte Carlo methods are used in many practical applications and this example has the advantage of being easy to understand and fairly straightforward to program. You can find many articles on this example on the internet (two useful ones are: https://www.geeksforgeeks.org/estimating-value-pi-using-monte-carlo/ and https://academo.org/demos/estimating-pi-monte-carlo/ ) but a very brief description is given below.

Imagine a circle of radius r inscribed in a square of sides 2r and further imagine we choose points inside the square at random (a bit like throwing darts extremely badly at a dartboard). If a large number of points (N) are used then we would expect the ratio of those falling inside the circle (Nc) to those inside the square (Ns=N) to be in the same proportion as the area of the circle to the area of the square. For large values of N we can therefore calculate an approximate value of pi as 4(Nc/Ns), independently of r.

In the Fortran 90 program used below we will centre the circle on the origin and set r=1.0 hence the square has sides of length 2r=2.0 and we need to generate points with coordinates (x,y) from uniform random distributions in the range -1.0 ≤ x ≤ 1.0 and -1.0 ≤ y ≤ 1.0.

Instead of calculating the points serially in turn, the points will be calculated in batches using a number of Condor jobs so that the calculations are performed concurrently (i.e. at the same time). In more realistic examples, this is what will speed up the overall computation. Clearly the points will need to be chosen in a statistically independent manner for the Monte Carlo technique to work properly so we use a different random number seed for each job, these having been written to separate input files beforehand.

## Creating the Fortran files

The first step is to create a program to generate the N input seed files with names of the form seed0, seed1, ... seed<N-1>. A suitable Fortran 90 program is this (initialise.f90 ) which generates 1000 input files:

`    `

This is rather more complicated than it needs to be because of Fortran's crude string handling capabilities and it may be easier to use a more powerful language such as Python.

```        program initialise

implicit none

double precision x
integer seed, i

character(len=10) :: filename
character(len=10) :: format_string

do i = 0, 999
if (i < 10) then
format_string = "(A4,I1)"
else if (i < 100) then
format_string = "(A4,I2)"
else
format_string = "(A4,I3)"
endif

write (filename, format_string) "seed", i

open(666,file=trim(filename),status='unknown')
write(666,'(i4)') i
close(666)
enddo

end
```

The next Fortran 90 progam is the one that will actually be run on the Condor pool to calculate how many points fall within the circle. An example implementation (pi.f90) is given below:

```        program pi

implicit none

double precision x, y
integer :: incircle, samplesize
integer, allocatable :: seed(:)
integer n, i, seed_value

parameter(samplesize=1000)

call random_seed(size = n)
allocate(seed(n))
seed(1) = seed_value
call random_seed(put=seed)

incircle = 0
do i = 1, samplesize
call random_number(x)
call random_number(y)
x = x*2.0d0 - 1.0d0   ! generate a random point
y = y*2.0d0 - 1.0d0   ! generate a random point
if (x*x + y*y .lt. 1.0d0) then
incircle = incircle+1             ! point is in the circle
endif
end do

open(2,file='incircle',status='unknown')
write(2, *) incircle
close(2)

end
```
This reads in a seed value from a file and initialises the random number generator with it. It then generates 1000 points and determines whether they are inside the unit circle. If so, a counter is incremented and finally the total number of points inside the circle is written to an output file. Note that the random_number() function generates random numbers in the range 0.0 ≤ x < 1.0 .

The final step is to sum all of the points falling inside the circle to calculate pi by reading the partial sums from the results files (incircle*). This can be achieved using a Fortran 90 code such as this (combine.f90):

```        program combine

implicit none

double precision pi
integer seed, i, incircle
integer total_incircle

character(len=20) :: filename
character(len=10) :: format_string

total_incircle = 0
do i = 0, 999
if (i < 10) then
format_string = "(A8,I1)"
else if (i < 100) then
format_string = "(A8,I2)"
else
format_string = "(A8,I3)"
endif

write (filename, format_string) "incircle", i
open(666,file=trim(filename),status='unknown')
close(666)
total_incircle = total_incircle + incircle
enddo

pi = 4.0d0 * DBLE(total_incircle) / 1000000.0
print '(A,F8.6)','Monte-Carlo estimate of pi: ', pi

end

```
This reads the partial sum stored in each file in turn and sums these to calculate the approximate value of pi. Again this is uncessarily complicated because of Fortran's basic string manipulation capabilities and using a higher level language, e.g. Python, could make life easier.

## Compilation and initialisation

The GNU gfortran compiler can be used to create the executables that are to be run on the Condor server i.e.:

`\$ gfortran initialise.f90 -o initialise`
`\$ gfortran combine.f90 -o combine`
It useful is to create a UNIX executable from the code to be run on the Condor pool as well for testing purposes:
`\$ gfortran pi.f90 -o pi`

The GNU Fortran compiler can also generate executables to be run under Windows for use on the Condor pool so that is not necessary to compile the code on a PC. To compile the pi.f90 code into a Windows executable (.exe) actually on the Condor server use:

`\$ gfortran-win pi.f90 -o pi.exe`

The 1000 seed files can be created using the previously built initialisation executable:

`\$ ./initialise`
The Condor code can then be tested on the server with a one off seed file e.g.
```\$ cp seed123 seed
\$ ./pi
\$ cat incircle
```
It can be extremely difficult to track down errors when jobs are run on the Condor pool and so testing the application first on the server should be considered essential.

## Creating the simplified job submission file

Each Condor job needs a submission file to describe how the job should be run. These can appear rather arcane to new users and therefore to help simplify the process, tools have been created which will work with more user-friendly job submission files. These are automatically translated into files which Condor understands and which users need not worry about. For this example, a submission file such as the one below can be used example called run_pi:

```executable = pi.exe
indexed_input_files = seed
indexed_output_files = incircle
log = log.txt
total_jobs = 1000
```

The Windows executable is specfied in the executable line. The lines starting indexed_input_files and indexed_output_files specify the input and output files which differ for each individual job. The total number of jobs is given in the total_jobs line. The underlying job submission processes will ensure that each individual job receives the correct input file (seed0.dat .. seed999.dat) and that the output files are indexed in a corresponding manner to the input files (e.g. output file incircle1 will correspond to seed1)

It is also possible to provide a list of non-indexed files (i.e. files that are common to all jobs), for example:

```input_files = common1.dat,common2.dat,common3.dat
```
This is useful if the common (non-indexed) files are relatively large.

Aside:

For testing, the indexed_output_files line can be omitted so that all of the output files are returned (the default). It is useful to also capture the standard output and error (which would normally be printed to the screen) using the indexed_stdout and indexed_stderr attributes respectively. For production runs, the output files should always be specified just in case there is a run-time problem and they are not created. In this case, Condor will place the job in the held ('H') state. To release these jobs and run them elsewhere use:

\$ condor_release -all.

To find out why jobs have been held use:

\$ condor_q -held

The job submission file can be edited using the nedit or nano editors, however all of the options in it can be overidden temporarily from the command line (the file itself is not changed). For example to submit five jobs instead of 1000 use:

```\$ fortran_submit run_pi -total_jobs=5
```
and to also change the executable to be used:
```\$ fortran_submit product -total_jobs=5 -executable=other.exe
```
(use the --help option to see all of the options available)

This is useful for making small changes without the need to use the UNIX system editors to change the job submission file.

## Running the Condor Jobs

The Condor jobs are submitted from the the Condor server using the command:

```\$ fortran_submit run_pi
```

It should return with something like:

```\$ fortran_submit run_pi
Submitting job(s).....................................................
......................................................................
......................................................................
1000 job(s) submitted to cluster 1394.```
You can monitor the progress of all of your jobs using:
```\$ condor_q
```
Initially the Condor jobs will remain in the idle state until machines become available: e.g.

```\$ condor_q

-- Schedd: Q1@condor1 : <10.102.32.11:45062?... @ 08/06/19 10:25:35
OWNER   BATCH_NAME         SUBMITTED   DONE   RUN    IDLE  TOTAL JOB_IDS
smithic CMD: run_pi.bat   8/6  10:23    524     97    379   1000 1394.4-999

476 jobs; 0 completed, 0 removed, 379 idle, 97 running, 0 held, 0 suspended
```
The overall state of the pool can be seen using the command:
`\$ condor_status`

## Combining the results

Once the jobs have completed (as indicated by the lack of any jobs in the queue), the directory should contain 1000 output files named incircle0, ... incircle999 which can be processed using the combine executable to generate the final result. e.g.:

```\$ ./combine
Monte-Carlo estimate of pi: 3.156000
```

## Discussion

Despite generating one million points, our estimate for pi is only accurate to a couple of significant figures so this is clearly not a very efficient method. If we examine the the partial sum files, the reason for this becomes apparent:

```\$ cat incircle* | more
789
789
789
789
789
789
789
...
```
All of the partial sums are exactly the same ! This means that we have essentially solved the same problem 1000 times and got the same solution so that the maximum number of significant digits we could expect is three (since there are 1000 points in each job). If we submitted one million jobs, with each generating just a single point, then we would then expect around 80 % of the jobs to have a point inside the circle. Therefore another way of looking at this would be to say the problem has only fine grained parallelism and is not suitable for Condor in this form.

We can do better than this by generating the random points beforehand and storing them into files. This program (randpoint.f90) does just that:

```        program rand_points

implicit none

double precision x, y
integer seed, iout, i, j

character(len=10) :: filename
character(len=10) :: format_string

do i = 0, 999
if (i < 10) then
format_string = "(A6,I1)"
else if (i < 100) then
format_string = "(A6,I2)"
else
format_string = "(A6,I3)"
endif

write (filename, format_string) "points", i
open(666,file=trim(filename),status='unknown')

do j = 1,1000
call random_number(x)
call random_number(y)
!write(666,'2(F8.6)') x, y
write(666,*) x, y
enddo

close(iout)
enddo

end
```
The Condor jobs now read 1000 points from each file and calculate the number of points lying inside the circle. This code could be used for example (newpi.f90):
```        program pi

implicit none

double precision x, y
integer :: incircle, samplesize
integer, allocatable :: seed(:)
integer n, i, seed_value

parameter(samplesize=1000)

incircle = 0
do i = 1, samplesize
print *, x, y
x = x*2.0d0 - 1.0d0   ! generate a random point
y = y*2.0d0 - 1.0d0   ! generate a random point
if (x*x + y*y .lt. 1.0d0) then
incircle = incircle+1             ! point is in the circle
endif
end do

open(2,file='incircle',status='unknown')
write(2, *) incircle
close(2)

end

```
The partial sums can be combined as before:
```\$ ./combine
Monte-Carlo estimate of pi: 3.140900
```
This is an improvement on the previous attempt and gives the same result as a serial version (the code for this in in serial.f90).

Although this is something of a toy example it does illustrate an important point - it very important when dealing with Monte Carlo methods to pay attention to the statistical properties of the random number generator(s). The first method used here is widely cited as an example on the internet (see for example this MPI version from Oak Ridge National Lab) however it is extremely inefficient and possibly fatally flawed.

﻿

last updated on Monday, 14-Oct-2019 10:55:38 BST by I.C. Smith

ARC, University of Liverpool, Liverpool L69 3BX, United Kingdom, +44 (0)151 794 2000