University of Wisconsin  Madison  CS 540 Lecture Notes  C. R. Dyer 
Computer Vision (Chapter 24.1, 24.5)
What is Computer Vision?
 "The central problem of computer vision is from one or
a sequence of images to understand the object or scene and its
3D properties."  Y. Aloimonos
 "Vision is the process by which descriptions of physical
scenes are inferred from images of them."  S. Zucker
 "A process that produces from images of the external 3D world
a description that is useful to the viewer and not cluttered
by irrelevant information."  D. Marr
Applications
 Medical image analysis
 Aerial photo interpretation
 Vehicle exploration and mobility
 Material handling
For example, part sorting and picking
 Inspection
For example, integrated circuit board and chip inspection
 Assembly
 Navigation
 Humancomputer interfaces
For example, handwriting recognition, optical character recognition (OCR),
face recognition,
gesture recognition, gaze tracking, 3D model acquisition
 Multimedia
For example, video databases,
image compression, image browsing, contentbased retrieval
 Telepresence/Teleimmersion/Telereality
For example, telemedicine, virtual classrooms, video conferencing,
interactive walkthroughs
Example: Vehicle Navigation using Neural Networks
The ALVINN system by D. Pomerleau is an example of a 2layer,
feedforward neural network for "visionbased lane keeping"
that was described earlier in the section on
Neural Networks.
Example: Face Recognition using an Eigenspace Representation
 Problem Statement: Given a training set of M images,
each of size N x N pixels,
where each image contains a single person's face, approximately
registered for face position, orientation, scale, and brightness,
and a test image, determine if the person in the test image is
one of the people in the training set, and, if so, indicate which
person it is.
 Need a similarity metric for measuring the "distance" between
two face images
 Need a way of representing face features to be compared within
the similarity metric
 One approach due to M. Turk and A. Pentland (see
"Eigenfaces for Recognition," J. Cognitive Neuroscience 3,
1991, pp. 7186) An online description of their work
and demos is available on the
Eigenfaces/Photobook web page.
For other information and research on the problem of face
recognition, see the
Face Recognition Home Page.
For information on face detection, see the
Face Detection Home Page.
Eigenspace Representation of Images
 An N x N image can be "represented" as a point in
an N^{2} dimensional
image space, where each dimension is associated with one of
the pixels in the image and the possible values in each dimension
are the possible gray levels of each pixel. For example,
a 512 x 512 image where each pixel is an integer in the range 0, ..., 255
(i.e., a pixel is stored in one byte), then image space is a
262,144dimensional space and each dimension has 256 possible values.
 If we represented our M training images as M points in image
space, then one way of recognizing the person in a new test image
would be to find its nearest neighbor training image in image space.
But this approach would be very slow since the size of image space
is so large, and would not exploit the fact that since all of our
images are of faces, they will likely be clustered relatively near
one another in image space. So, instead, let's represent each image
in a lowerdimensional feature space, called face space or
eigenspace.
 Say we have M' images, E_{1}, E_{2}, ..., E_{M'}, called
eigenfaces or eigenvectors. These images define a
basis set, so that each face image will be defined in terms
of how similar it is to each of these basis images. That is, we
can represent an arbitrary image I as a weighted (linear)
combination of these eigenvectors as follows:
 Compute the average image, A, from all of the training images
I_{1}, I_{2}, ..., I_{M}:
M

1 \
A =  \ I_{i}
M /
/

i=1
 For k = 1, ..., M' compute a realvalued weight, w_{k},
indicating
the similarity between the input image, I,
and the kth eigenvector, E_{k}:
w_{k} = E_{k}^{T} * (I  A)
where I is a given image and is represented as a column vector
of length N^{2}, E_{k} is the kth eigenface image and
is a column vector of length N^{2}, A
is a column vector of length N^{2}, * is
the dot product operation, and  is pixel by pixel subtraction.
Thus w_{k} is a realvalued scalar.
 W = [w_{1}, w_{2}, ..., w_{M'}]^{T}
is a column vector of weights
that indicates the contribution of each eigenface image in representing
image I. So, instead of representing image I in image
space, we'll represent it as a point W
in the M'dimensional weight space that we'll call face space
or eigenspace. Hence, each image is projected from
a point in the high dimensional image space down to a point in the
much lower dimensional eigenspace. In terms of compression, each image
is represented by M' real numbers, which means that for a typical value
of M'=10 and 32 bits per weight, we need only 320 bits/image to encode it
in face space. (Of course, we must also store the M' eigenface images,
which are each N^{2} pixels, but this cost is amortized over all of the
training images, so it can be considered to be a small additional cost.)
 Notice that image I can be approximately reconstructed from W
as follows:
M'

\
I ~ A + \ w_{i} * E_{i}
/
/

i=1
This reconstruction will be exact if M' = min(M, N^{2}).
Hence, representing an image in eigenspace won't
be exact in that the image won't be reconstructible, but it will be
a pretty good approximation that's sufficient for differentiating
between faces.
 Now, select a value for M' and then determine the
M' "best" eigenvector images (i.e., eigenfaces). How?
Answer: Use the statistics technique called
Principal Components Analysis (also called the
KarhunenLoeve transform in communications theory).
Intuitively, this technique selects the M' images
that maximize the information content in the compressed (i.e.,
eigenspace) representation.
The best M' eigenface images are computed as follows:
 For each training image I_{i}, normalize it by subtracting
the mean (i.e., the "average image"): Y_{i} = I_{i}  A
 Compute the N^{2} x N^{2} Covariance Matrix:
M

1 \
C =  \ Y_{i} Y_{i}^{T}
M /
/

i=1
 Find the eigenvectors of C that are associated with
the M' largest eigenvalues. Call the eigenvectors
E_{1}, E_{2}, ..., E_{M'}.
These are the eigenface images
used by the algorithm given above.
Note: C is very large, so this method is computationally
very intensive. However, there are relatively fast methods for
finding the k largest eigenvectors, which is all we need.
Face Recognition Algorithm
The entire face recognition algorithm can now be given:
 Given a training set of face images, compute the M'
largest eigenvectors, E_{1}, E_{2}, ..., E_{M'}.
M' = 10 or 20 is a typical value used. Notice that this
step is done once "offline."
 For each different person in the training set, compute the
point associated with that person in eigenspace. That is, use
the formula given above to compute
W = [w_{1}, ..., w_{M'}].
Note that this step is also done once offline.
 Given a test image, I_{test}, project it to the M'dimensional
eigenspace by computing the point W_{test}, again using the formula
given above.
 Find the closest training face to the given test face:
d = min  W_{test}  W_{k} 
k
where W_{k} is the point in eigenspace associated with
the kth person
in the training set, and  X  denotes the Euclidean norm defined
as (x_{1}^{2} + x_{2}^{2} + ... + x_{n}^{2})^{1/2}
where X is the vector [x_{1}, x_{2}, ..., x_{n}].
 Find the distance of the test image from eigenspace (that is,
compute the projection distance so that we can estimate the likelihood
that the image contains a face):
dffs =  Y  Yf 
where Y = I_{test}  A, and
Yf = sum_i_from_1_to_M' (w_{test,i} * E_{i}).

If dffs < Threshold1
; Test image is "close enough" to the eigenspace
; associated with all of the training faces to
; believe that this test image is likely to be some
; face (and not a house or a tree or something
; other than a face)
then if d < Threshold2
then classify I_{test} as containing the face of person k,
where k is the closest face in the eigenspace to
W_{test}, the projection of I_{test} to eigenspace
else classify I_{test} as an unknown person
else classify I_{test} as not containing a face
Example
Say we have two 3 x 3 training images, so N=3 and M=2, defined as follows:
Image I_{1}
0  0  0 
10  10  10 
0  0  0 

Image I_{2}
0  10  0 
0  10  0 
0  10  0 

We represent these two images as column vectors of length 3*3=9,
so we have
I_{1} = [0 0 0 10 10 10 0 0 0]^{T}
I_{2} = [0 10 0 0 10 0 0 10 0]^{T}
Now assume that we use a subspace of dimension 1, i.e., M'=1,
and the eigenvector computed from the two training images is:
E_{1} = [5 0 5 10 5 10 5 0 5]^{T}
(Note: This is not the true eigenvector but is used here to keep
the example simple.)
The average image, A, is computed from I_{1} and I_{2}
by computing for
each pixel, the average gray level from the two images' corresponding
pixels. Thus, the second pixel in A is (0+10)/2 = 5. Hence,
A = [0 5 0 5 10 5 0 5 0]^{T}
We can now compute how the first training image, I_{1},
is projected into the
onedimensional eigenspace by computing
W_{1} = [w_{1,1}], where
w_{1,1} = E_{1}^{T} * (I_{1}  A).
So, here we have
I_{1'} = I_{1}  A = [0 5 0 5 0 5 0 5 0]^{T}
w_{1,1} = 5*0 + 0*5 + 5*0 + 10*5 + ... + 5*0 = 0
W_{1} =[0]
In other words, image I_{1} projects to the origin in
this onedimensional subspace
defined by basis image E_{1}.
Similarly, for I_{2} we get
W_{2} = [w_{2,1}], where
w_{2,1} = E_{1}^{T} * (I_{2}  A)
= [5 0 5 10 5 10 5 0 5] * [0 5 0 5 0 5 0 5 0]^{T}
= 100
So, W_{2} = [100].
Now, say we are given the following test image
Image I_{test}
0  7  3 
0  10  10 
0  10  0 
Projecting I_{test} into face space we get
W_{test} = [w_{test,1}], where
w_{test,1} = E_{1}^{T} * (I_{test}  A)
= [5 0 5 10 5 10 5 0 5] * [0 2 3 5 0 5 0 5 0]^{T}
= 15
So, W_{test} = [15], which means that W_{test}
is more similar to image I_{1} than to image I_{2}.
Therefore, we would classify I_{test} as the same class
as I_{1}.
Face Recognition Accuracy and Extensions to Eigenspace Approach
 Performance using a 20dimensional eigenspace resulted in
about 95% correct classification on a database of about 7,500 images
of about 3,000 people
 If training set contains multiple images of each person, then
for each person compute the average point in eigenspace from the
points computed for each image of that person
 Method requires that all images in the database contain faces
of about the same size, position, and orientation, so they can be
compared using this global distance function in eigenspace
 If there are multiple images of a 3D object (e.g., a person's
head from many different positions and orientations), then the points
in eigenspace corresponding to the different 3D views can be
combined by fitting a hypersurface to all the points, and storing
this hypersurface in eigenspace as the description of that person.
Then, classify a test image as the person corresponding to the
closest hypersurface
Applications of Eigenfaces
There are a variety of commercial products that are now available
based on the eigenface method. See, for example,
TrueFace PC,
which does computer logins by face recognition, and
Viisage and
Identix for
face recognition products for various biometric applications.
Copyright © 19962003 by Charles R. Dyer. All rights reserved.