University of Wisconsin - Madison | CS 540 Lecture Notes | C. R. Dyer |
Computer Vision (Chapter 24.1, 24.5)
What is Computer Vision?
- "The central problem of computer vision is from one or
a sequence of images to understand the object or scene and its
3D properties." -- Y. Aloimonos
- "Vision is the process by which descriptions of physical
scenes are inferred from images of them." -- S. Zucker
- "A process that produces from images of the external 3D world
a description that is useful to the viewer and not cluttered
by irrelevant information." -- D. Marr
Applications
- Medical image analysis
- Aerial photo interpretation
- Vehicle exploration and mobility
- Material handling
For example, part sorting and picking
- Inspection
For example, integrated circuit board and chip inspection
- Assembly
- Navigation
- Human-computer interfaces
For example, handwriting recognition, optical character recognition (OCR),
face recognition,
gesture recognition, gaze tracking, 3D model acquisition
- Multimedia
For example, video databases,
image compression, image browsing, content-based retrieval
- Telepresence/Tele-immersion/Tele-reality
For example, tele-medicine, virtual classrooms, video conferencing,
interactive walkthroughs
Example: Vehicle Navigation using Neural Networks
The ALVINN system by D. Pomerleau is an example of a 2-layer,
feedforward neural network for "vision-based lane keeping"
that was described earlier in the section on
Neural Networks.
Example: Face Recognition using an Eigenspace Representation
- Problem Statement: Given a training set of M images,
each of size N x N pixels,
where each image contains a single person's face, approximately
registered for face position, orientation, scale, and brightness,
and a test image, determine if the person in the test image is
one of the people in the training set, and, if so, indicate which
person it is.
- Need a similarity metric for measuring the "distance" between
two face images
- Need a way of representing face features to be compared within
the similarity metric
- One approach due to M. Turk and A. Pentland (see
"Eigenfaces for Recognition," J. Cognitive Neuroscience 3,
1991, pp. 71-86) An online description of their work
and demos is available on the
Eigenfaces/Photobook web page.
For other information and research on the problem of face
recognition, see the
Face Recognition Home Page.
For information on face detection, see the
Face Detection Home Page.
Eigenspace Representation of Images
- An N x N image can be "represented" as a point in
an N2 dimensional
image space, where each dimension is associated with one of
the pixels in the image and the possible values in each dimension
are the possible gray levels of each pixel. For example,
a 512 x 512 image where each pixel is an integer in the range 0, ..., 255
(i.e., a pixel is stored in one byte), then image space is a
262,144-dimensional space and each dimension has 256 possible values.
- If we represented our M training images as M points in image
space, then one way of recognizing the person in a new test image
would be to find its nearest neighbor training image in image space.
But this approach would be very slow since the size of image space
is so large, and would not exploit the fact that since all of our
images are of faces, they will likely be clustered relatively near
one another in image space. So, instead, let's represent each image
in a lower-dimensional feature space, called face space or
eigenspace.
- Say we have M' images, E1, E2, ..., EM', called
eigenfaces or eigenvectors. These images define a
basis set, so that each face image will be defined in terms
of how similar it is to each of these basis images. That is, we
can represent an arbitrary image I as a weighted (linear)
combination of these eigenvectors as follows:
- Compute the average image, A, from all of the training images
I1, I2, ..., IM:
M
-----
1 \
A = - \ Ii
M /
/
-----
i=1
- For k = 1, ..., M' compute a real-valued weight, wk,
indicating
the similarity between the input image, I,
and the kth eigenvector, Ek:
wk = EkT * (I - A)
where I is a given image and is represented as a column vector
of length N2, Ek is the kth eigenface image and
is a column vector of length N2, A
is a column vector of length N2, * is
the dot product operation, and - is pixel by pixel subtraction.
Thus wk is a real-valued scalar.
- W = [w1, w2, ..., wM']T
is a column vector of weights
that indicates the contribution of each eigenface image in representing
image I. So, instead of representing image I in image
space, we'll represent it as a point W
in the M'-dimensional weight space that we'll call face space
or eigenspace. Hence, each image is projected from
a point in the high dimensional image space down to a point in the
much lower dimensional eigenspace. In terms of compression, each image
is represented by M' real numbers, which means that for a typical value
of M'=10 and 32 bits per weight, we need only 320 bits/image to encode it
in face space. (Of course, we must also store the M' eigenface images,
which are each N2 pixels, but this cost is amortized over all of the
training images, so it can be considered to be a small additional cost.)
- Notice that image I can be approximately reconstructed from W
as follows:
M'
-----
\
I ~ A + \ wi * Ei
/
/
-----
i=1
This reconstruction will be exact if M' = min(M, N2).
Hence, representing an image in eigenspace won't
be exact in that the image won't be reconstructible, but it will be
a pretty good approximation that's sufficient for differentiating
between faces.
- Now, select a value for M' and then determine the
M' "best" eigenvector images (i.e., eigenfaces). How?
Answer: Use the statistics technique called
Principal Components Analysis (also called the
Karhunen-Loeve transform in communications theory).
Intuitively, this technique selects the M' images
that maximize the information content in the compressed (i.e.,
eigenspace) representation.
The best M' eigenface images are computed as follows:
- For each training image Ii, normalize it by subtracting
the mean (i.e., the "average image"): Yi = Ii - A
- Compute the N2 x N2 Covariance Matrix:
M
-----
1 \
C = - \ Yi YiT
M /
/
-----
i=1
- Find the eigenvectors of C that are associated with
the M' largest eigenvalues. Call the eigenvectors
E1, E2, ..., EM'.
These are the eigenface images
used by the algorithm given above.
Note: C is very large, so this method is computationally
very intensive. However, there are relatively fast methods for
finding the k largest eigenvectors, which is all we need.
Face Recognition Algorithm
The entire face recognition algorithm can now be given:
- Given a training set of face images, compute the M'
largest eigenvectors, E1, E2, ..., EM'.
M' = 10 or 20 is a typical value used. Notice that this
step is done once "offline."
- For each different person in the training set, compute the
point associated with that person in eigenspace. That is, use
the formula given above to compute
W = [w1, ..., wM'].
Note that this step is also done once offline.
- Given a test image, Itest, project it to the M'-dimensional
eigenspace by computing the point Wtest, again using the formula
given above.
- Find the closest training face to the given test face:
d = min || Wtest - Wk ||
k
where Wk is the point in eigenspace associated with
the kth person
in the training set, and || X || denotes the Euclidean norm defined
as (x12 + x22 + ... + xn2)1/2
where X is the vector [x1, x2, ..., xn].
- Find the distance of the test image from eigenspace (that is,
compute the projection distance so that we can estimate the likelihood
that the image contains a face):
dffs = || Y - Yf ||
where Y = Itest - A, and
Yf = sum_i_from_1_to_M' (wtest,i * Ei).
-
If dffs < Threshold1
; Test image is "close enough" to the eigenspace
; associated with all of the training faces to
; believe that this test image is likely to be some
; face (and not a house or a tree or something
; other than a face)
then if d < Threshold2
then classify Itest as containing the face of person k,
where k is the closest face in the eigenspace to
Wtest, the projection of Itest to eigenspace
else classify Itest as an unknown person
else classify Itest as not containing a face
Example
Say we have two 3 x 3 training images, so N=3 and M=2, defined as follows:
We represent these two images as column vectors of length 3*3=9,
so we have
I1 = [0 0 0 10 10 10 0 0 0]T
I2 = [0 10 0 0 10 0 0 10 0]T
Now assume that we use a subspace of dimension 1, i.e., M'=1,
and the eigenvector computed from the two training images is:
E1 = [5 0 5 10 5 10 5 0 5]T
(Note: This is not the true eigenvector but is used here to keep
the example simple.)
The average image, A, is computed from I1 and I2
by computing for
each pixel, the average gray level from the two images' corresponding
pixels. Thus, the second pixel in A is (0+10)/2 = 5. Hence,
A = [0 5 0 5 10 5 0 5 0]T
We can now compute how the first training image, I1,
is projected into the
one-dimensional eigenspace by computing
W1 = [w1,1], where
w1,1 = E1T * (I1 - A).
So, here we have
I1' = I1 - A = [0 -5 0 5 0 5 0 -5 0]T
w1,1 = 5*0 + 0*-5 + 5*0 + 10*5 + ... + 5*0 = 0
W1 =[0]
In other words, image I1 projects to the origin in
this one-dimensional subspace
defined by basis image E1.
Similarly, for I2 we get
W2 = [w2,1], where
w2,1 = E1T * (I2 - A)
= [5 0 5 10 5 10 5 0 5] * [0 5 0 -5 0 -5 0 5 0]T
= -100
So, W2 = [-100].
Now, say we are given the following test image
Image Itest
0 | 7 | 3 |
0 | 10 | 10 |
0 | 10 | 0 |
Projecting Itest into face space we get
Wtest = [wtest,1], where
wtest,1 = E1T * (Itest - A)
= [5 0 5 10 5 10 5 0 5] * [0 2 3 -5 0 5 0 5 0]T
= 15
So, Wtest = [15], which means that Wtest
is more similar to image I1 than to image I2.
Therefore, we would classify Itest as the same class
as I1.
Face Recognition Accuracy and Extensions to Eigenspace Approach
- Performance using a 20-dimensional eigenspace resulted in
about 95% correct classification on a database of about 7,500 images
of about 3,000 people
- If training set contains multiple images of each person, then
for each person compute the average point in eigenspace from the
points computed for each image of that person
- Method requires that all images in the database contain faces
of about the same size, position, and orientation, so they can be
compared using this global distance function in eigenspace
- If there are multiple images of a 3D object (e.g., a person's
head from many different positions and orientations), then the points
in eigenspace corresponding to the different 3D views can be
combined by fitting a hypersurface to all the points, and storing
this hypersurface in eigenspace as the description of that person.
Then, classify a test image as the person corresponding to the
closest hypersurface
Applications of Eigenfaces
There are a variety of commercial products that are now available
based on the eigenface method. See, for example,
TrueFace PC,
which does computer logins by face recognition, and
Viisage and
Identix for
face recognition products for various biometric applications.
Copyright © 1996-2003 by Charles R. Dyer. All rights reserved.