Our Condor Installation

Objective of this exercise

This exercise should help you understand the basics of how Condor is installed, what Condor processes (a.k.a. daemons) are running, and what they do.

treinamento

Before you start, make sure you are logged into one of the treinamentoXX machines ([00] < XX < [20]). You should have been given your name and password when you arrived at the school.

Looking at our Condor installation

How do you know what version of Condor you are using? Try condor_version:

% condor_version
$CondorVersion: 7.4.4 Oct 13 2010 BuildID: 279383 $
$CondorPlatform: X86_64-LINUX_DEBIAN50 $

You might be surprised that it reports Debian 5.0 instead of Ubuntu. It is reporting the operating system that it was compiled on, not the operating system that is in use. Don't worry, the Debian binaries work just fine on Ubuntu.

Extra Tip: The OS version

Do you know how to find the OS version? You can usually look in /etc/issue to find out:


% cat /etc/issue
Ubuntu 10.04.1 LTS \n \l

Or you can run:

% lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 10.04.1 LTS
Release:        10.04
Codename:       lucid

Where is Condor installed?

% which condor_q
/opt/condor/bin/condor_q

Condor has some configuration files that it needs to find. It finds them because we have put CONDOR_CONFIG into your environment:

% echo $CONDOR_CONFIG
/opt/condor/etc/condor_config

Condor has some directories that it keeps records of jobs in. Remember that each submission computer keeps track of all jobs submitted to it. That's in the local directory:

% condor_config_val -v LOCAL_DIR
LOCAL_DIR: /opt/condor/scratch
  Defined in '/opt/condor/scratch/condor_config.local', line 17.

% ls -CF /opt/condor/scratch
condor_config.local  execute/  log/  spool/

The spool directory is where Condor keeps the jobs you submit, while the execute directory is where Condor keeps running jobs. Since this is a submission-only computer, it should be empty.

Check if Condor is running:

% ps auwx | grep condor_ | grep -v grep
daemon    1541  0.0  0.1  33992  3596 ?        Ss   Dec06   0:07 /opt/condor/sbin/condor_master
daemon    1542  0.0  0.2  34544  4880 ?        Ss   Dec06   0:00 condor_schedd -f
daemon    1543  0.0  0.2  34384  4936 ?        Ss   Dec06   0:15 condor_startd -f
root      1546  0.0  0.1  22612  3172 ?        S    Dec06   0:03 condor_procd -A /tmp/condor-lock.treinamento010.0161995238895578/procd_pipe.SCHEDD -S 60 -C 1

This computer only has Condor processes running: the condor_master, the condor_schedd, and the condor_procd. (Actually, you might see more, because we have two versions of Condor running. More about that on Wednesday.) In general, you might see many different Condor processes. Here's a list of the processes:

% condor_config_val COLLECTOR_HOST
treinamento02.ncc.unesp.br

condor_q

You can find out what jobs have been submitted on your computer with the condor_q command:

% condor_q

-- Submitter: treinamento01.ncc.unesp.br : <200.145.46.65:56001> : treinamento01.ncc.unesp.br
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               

0 jobs; 0 idle, 0 running, 0 held

Nothing is running right now. If something was running, you would see output like this:

% condor_q
-- Submitter: treinamento01.ncc.unesp.br : <200.145.46.65:56001> : treinamento01.ncc.unesp.br
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
   9.0   zmiller        12/7  05:23   0+00:00:06 R  0   0.0  test.sh 60        
   9.1   zmiller        12/7  05:23   0+00:00:09 R  0   0.0  test.sh 60        
   9.2   zmiller        12/7  05:23   0+00:00:09 R  0   0.0  test.sh 60        
   9.3   zmiller        12/7  05:23   0+00:00:09 R  0   0.0  test.sh 60        
   9.4   zmiller        12/7  05:23   0+00:00:09 R  0   0.0  test.sh 60        

5 jobs; 0 idle, 5 running, 0 held

The output that you see will be different depending on what jobs are running. Notice what we can see from this:

Extra credit

What else can you find out with condor_q? Try any one of:

Double bonus points

How do you use the -constraint or -format options to condor_q? When would you want them? When would you use the -l option? This might be an easier exercise to try once you submit some jobs.

condor_status

You can find out what computers are in your Condor pool. (A pool is similar to a cluster, but it doesn't have the connotation that all computers are dedicated full-time to computation: some may be desktop computers owned by users.) To look, use condor_status:

% condor_status

Name               OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime

slot1@treinamento0 LINUX      X86_64 Unclaimed Idle     0.030   498  0+02:47:07
slot2@treinamento0 LINUX      X86_64 Unclaimed Idle     0.000   498  0+06:47:21
slot3@treinamento0 LINUX      X86_64 Unclaimed Idle     0.000   498  0+06:47:22
slot4@treinamento0 LINUX      X86_64 Unclaimed Idle     0.000   498  0+06:47:23
slot1@treinamento0 LINUX      X86_64 Unclaimed Idle     0.040   498  0+02:46:09
slot2@treinamento0 LINUX      X86_64 Unclaimed Idle     0.000   498  0+06:46:26
slot3@treinamento0 LINUX      X86_64 Unclaimed Idle     0.000   498  0+06:46:27

...

Each computer shows up four or times, with a slotN at the beginning of the name. This is because we've configured Condor to be able to run multiple jobs per computer. Slot refers to "job slot". (These are multi-core computers.)

Let's look at exactly what you can see:

Extra credit

What else can you find out with condor_status? Try any one of:

Note in particular the options like -master and -schedd. When would these be useful? When would the -l option be useful?