High Throughput Computing using Condor

Condor Job Analysis

Although the Condor recent usage statistics provide a good overview of Condor activity they do not provide a detailed analysis of how Condor jobs are performing. It may be the case that a significant amount of CPU time is wasted by job evictions but this is not evident in the bulk statistics. This page and linked pages aim to provide users with more in-depth analysis by breaking down CPU usage into two parts: goodput and badput. Goodput is the CPU time which is actually put to good use by Condor and contributes to the solution of the problem at hand. Badput is CPU time wasted by job evictions and can be significant for long running jobs which do not use checkpointing.

By minimising badput, it is possible turn around jobs more quickly leading to higher throughput and ultimately results being produced more quickly. This also means that less electricity - and hence money - is wasted by the Condor pool PCs so that "everyone's a winner". If you find that your jobs are clocking up too much badput, then it may be a good idea to split them into shorter jobs or use checkpointing. If you are unsure of how to proceed please contact the Condor administrator Ian C. Smith (email: i.c.smith@liverpool.ac.uk) for advice.

The table below gives cummulative figures for jobs submitted in approximately the last month. You can get a breakdown of these figures by following the link corrresponding to your username. From there you can drill down to more detailed analysis by following the job links. For jobs that do not use checkpointing, all evictions are assumed to contribute to badput (in other words all of the CPU used from the current job start time up to the eviction is wasted). For checkpointing jobs, evictions are assumed not to cause any badput (however there may still be other causes). This may sometimes lead to badput being under-estimated.

The statistics were produced using the Condor Log Analyzer at University of Notre Dame. This is free to use and open to all if you wish to analyse your own log files. We cannot guarantee that this will always be 100 % accurate or will always work since the internal workings of it are known only to its authors.

Condor use for approximately the last month

Username CPU Time Goodput Badput Submitted Evicted Completed
dlythgoe 76910 30172 4673123164 154968 16389
riham 119849 106394 13434103350 10896 32871
campagne 6781 2688 4093218400 374527 218400
mesudell 105128 101746 337145000 6838 26822
thmel 40376 37760 2616100 3006 38
dmhughes 8705 8391 3143504 1379 3504
smithic 3015 2876 137541 723 389
robertwi 715 715 020012 225 11010
TOTAL 361479 290742 70696 414071 552562 309423

All times in hours, click links for detailed analysis

Last updated: Wed Aug 2 11:45:34 BST 2017