CS 540 Notes for 05/04/95 Lars Van Dam ************************************ Announcements: Exam 180 Science Hall Saturday 5:05 pm not cumulative ************************************ I) FACE RECOGNITION 2 Methods 1) Pentland et al eigenfaces method 2) Burt et al pattern trees method Goal: recognize person in an image ----------------- | | | | | | | | n by n image.. n pixels| | | |______________ | in monochrome set up each pixel => 1 byte 0 => black 255 => white There are many factors that affect an image (scene conditions & noise) A) Scene Parameters - position, orientation, lighting B) Image Parameters - Noise(from digitizing process) - Focus, shutter speed *********************************** Pentland et al.'s Eigenface Method *********************************** Assumptions: Given P training examples where faces are registered (similar position, tilt, scale, brightness) Method: Nearest-Neighbor classification That is, classify a given test image containing a single face as the same face as the training image that is most similar to it. Pick closest image of P in training set This is done by having a global similarity (distance) function ==> Image Matching is basic operation n by n image => n^2 dimensional "Image Space" For example, for an image containing only 2 pixels, the Image Space is: 255| | pixel1 | | | 0|_________________ 0 pixel2 255 Given images X1 .... Xp - compute m "feature images" called Eigenfaces U1, ..., Um are the feature images For example, one image could stress area of eyes Can now represent each image as a point in "Face Space", which is m-dimensional, by characterizing a given image in terms of how similar it is to each of the m eigenface feature images. Do this as follows: Let A = 1/p * ( Summation of Xi from 1 to p) Hence, A is the average of all the p training images Now represent an image X as a point W = [w1, ..., wm] where wk = Uk(X - A) ------- compute difference from average face /\ | ------- dot product used to determine degree of correlation or similarity between eigenface Uk and the input image after the average face is subtracted That is, wk is a number that describes how similar the image X is to the ith eigenface feature image. In general, if there are enough eigenface images, then we can use them to reconstruct exactly the original image as follows: X = (summation from i=1 to m of wi*Ui + A ) Hence, the image X is a linear combination of the eigenface images using the wi's as the weights. X can be reconstructed exactly if m = n*n (i.e., the number of pixels) Otherwise, with smaller m, X is approximated by using the "best" m feature images possible. (In practice, m is usually in range 10 to 40) Now each training image can be represented by a point in the m-dimensional face space defined by the wi's. For example, say there are 2 training images, one of Joe and one of Brad, and m=2. These 2 images are represented by two points in this face space. Now given a test image, Xtest, this image can also be represented as a point in this space as follows: | Brad | Wx | Joe | | Xtest | |______________________- W1 Now to decide whose face is in the image Xtest, compute the distance from Xtest to Joe and the distance from Xtest to Brad in this 2-dimensional face space. Since Joe's distance is the smallest, Joe is picked as the person in image Xtest. Pick "best" m feature images using Principle Component Analysis (also known as Karhunen-Loeve transform) Note: the bigger m is the more accurate a representation you can get Final Algorithm: ---------------- 1) Given P training images and user-chosen value for the parameter m, compute m eigenface feature images U1....Um 2) If image in the training set is of a different person, compute that person's coordinates in face space, Wj, for j = 1, ..., P. (Note: If there are multiple images of a single person, compute the person's average coordinates in face space by computing the centroid of all the points in face space.) 3) Given test image, Xtest, "project" it into face space using the formula given earlier. Xtest -> Wtest 4) Find training face closest to test face: 2 d = min ||Wtest- Wj|| person j Note: || x || indicates the Euclidean distance 5) Find distance from face space 2 diffs = ||Y - Yf|| Where Y = Xtest - A and Yf = summation i=1 to m of wi*Ui 6) if diffs < T1 ; face is "close enough" to face space to likely be SOME face ; and not a tree or a house or something else other than a face then if d < T2 then classify person j else classify as unknown person (i.e., not in training set) else not a face ****************************** Method 2 - Burt's Pattern Tree ****************************** - Hierarchical Pattern tree of user selected features - Each feature is defined as a j x k window of pixels in the mth level of a "pyramid" of blurred images. This defines a template corresponding to a feature of interest at some "scale" appropriate for that feature For example, a template may be defined at a "coarse" resolution corresponding to the overall head shape, another template at a medium resolution corresponding to both eyes together, and a third template at a fine resolution corresponding to a small freckle on the cheek. - The pyramid of images is constructed as follows: * The bottom level, level 0, is the original image, say of size 2^n x 2^n. This is the finest resolution or scale. * Level 1 is constructed by (i) making a copy of the image at the level below, (ii) blurring that image, and then subsampling it by taking every other pixel in each row, and every other row, making the new image size 2^(n-1) x 2^(n-1). * This process is repeated up to n times to produce a set of n+1 images of sizes 2^n x 2^n , ..., 1 x 1. The top levels are coarser because of the blurring that was repeated done. They are also lower resolution because of the subsampling that was done. - From the set of templates construct a template tree: tree of templates / \ template template / \ / \ ...... .... Note: top-down low resolution higher up in the tree high resolution lower in tree arcs specify relative spatial position and orientation information that determines the offset position and orientation of the child template from the parent's template position and orientation. - Algorithm: 1. Given input image, construct its set of images defining a pyramid of successively more blurred versions. 2. Match root template at all possible positions of corresponding level of input image pyramid. At all positions where measure of match is sufficiently high (greater than a threshold), move to each child node in pattern tree, determine associated level for each child template, and its predicted position and orientation given the offset information in the arc from parent to child node. 3. Repeat match process for each child node. 4. If entire pattern tree is verified that corresponds to one hypothesized match position at root, then output the corresponding pixel coordinates in the level 0 image, indicating that is the position where the face was found.