CS 766 | Fall 2004 |
By viewing the same scene from two different points of view, it is often possible to ascertain some auxiliary information about the scene. Information such as depth ordering and relative placements of objects in the scene, as well as motion detection and tracking can often by found by comparisons between the two images. In the simplest case where all camera parameters are know this is a straightforward problem, but in the case where we are given only two images and no knowledge of the cameras intrinsic or extrinsic parameters, we are unable to directly apply the straightforward approaches. In this paper we discuss an attempt to circumvent the lack of calibration information in order to naively apply the calibrated algorithms with two images modified from the originals.
For more information, see the web page.
Face recognition is very interested topic in computer vision. The face orientation and illumination causes the face recognition is not very easy in real application. Volker Blanz and Thomas Vetter provide the face recognition method based on 3D morphable model. In this method, a 2D face image is matched to 3D face model which can be morphed according to some parameter. The face recognition can be applied according to those parameters. For any given face image, we should map 2D image to the 3D face models before face recognition. This project will use PCA eigenface method to match 2D image to 3D face model. The 3D face model library used in this project, which contains 100 3D face models, is provided by of University of Freiburg.
For more information, see the web page.
Level set method has been widely used in many areas due to its many advantages. It relies on a fundamental shift in how one views moving boundaries, rethinking the natural geometric Lagrangian perspective and exchanging it for an Eulerian, initial value partial differential equation perspective. The resulting numerical techniques can be used to track two or three dimensional complex fronts that can develop sharp corners and change topological properties. In this report, we first give a review of the level set method, illustrate its basic ideas and advantages. Then we focus on its applications on aspects of image processing and computer vision, including image noise removal, image smoothing, and shape detection. Problem formulations are given, as well as numerical implementation issues. Experimental results are shown to illustrate the correctness and efficiency of the method.
For more information, see the web page.
Active contours, or snakes, are used extensively for image segmentation in computer vision, particularly to locate object boundaries. A general approach was reviewed in [1]. There have been numerous publications that proposed improvements to deficiencies in the general method. Most try to overcome two major problems that are as follows: First, the initial contour must be close to the true boundary or else it will likely converge on the wrong result. Second, active contours have difficulties progressing into concave boundary regions. To attempt to minimize these troublesome attributes, in [2], additional pressure forces were proposed, and in [3] a gradient vector flow as an external force was defined to overcome both problems. A third solution is presented in [3] that uses first order statistics for a pressure force. This solution is implemented in this project. For the eventual purpose of object tracking an implementation of active contours that will incorporate color information, have loose constraints on initial placement and converge on the object boundary fast will be investigated.
For more information, see the web page.
There are numerous applications we can think of if we can determine the vanishing line in a regular or irregular texture. However, sometimes determining a single vanishing point in a texture is itself difficult. In most cases we are able to manually define vanishing points in a texture, but then it is less accurate and requires human intervention thereby limiting practical usage of the vanishing line in the texture for further applications. Having an accurate estimate of vanishing line we can extend a regular or even an irregular texture along the vanishing line. We can even think of novel ways of making use of vanishing lines like traveling along a vanishing line in a given 2-D picture, thus creating a 3-D video sequence. Due to the varied practical applicability of vanishing lines in images, in this project my primary goal will be to implement automatic detection of vanishing line and use it for texture extension or for creating a video sequence from a single 2-D image.
In the past the challenge of filling in holes after removing a large object from a picture was addressed by two classes of algorithms: 1) texture synthesis, and 2) inpainting. Criminisi, Perez, and Toyama recently proposed a new algorithm that combines both texture synthesis and inpainting. This project will implement this algorithm.
For more information, see the web page.
References:
A. Criminisi, P. Prez, and K. Toyama, Object Removal by Exemplar-Based Inpainting, Proc. CVPR, 2003.
A. Criminisi, P. Prez, and K. Toyama, Region Filling and Object Removal by Exemplar-Based Inpainting, Technical Report MSR-TR-2003-83, Microsoft Research, Oct 2003.
When matching two images, one has to deal with the offsets in position and orientation. The more challenging problem is the difference in resolution or scale. In this project, we investigate the approach to match two images with different resolutions plus some projective transformation. The one-to-one image matching problem can become one-to- many after using scale-space representation. Feature points are extracted at all scales using adapted Harris corner detector. Each feature point is characterized by a "local jet" based description- vector whose elements are differential invariants. The image matching process begins with the calculation of Mahalanobis distance between feature points. Then, RANSAC is used to estimate the projective transformation between two images. The scaled high-resolution image with the best matches gives the best estimate of the scale difference between the two original images.
For more information, see the web page.
The shape context of a point in an image is a measure of the distribution of positions of the other points relative to it. Shape context has been used to compute transformations for image alignment, to classify handwritten digits, to solve problems of reading noisy characters meant to distinguish humans from computers, and to recognize objects. We apply Belongie and Malik's object recognition method of shape context matching combined with k-medoid clustering to the problem of facial pose classification. We choose our categories of objects to be facial poses, and members of a category to be images of different individuals in a given pose. Cross-validation on synthetic images gives 100 percent accuracy at distinguishing the seven pose categories of frontal and 30, 60, and 90 degrees to the left and right.
Attention analysis provides an effective alternative way to semantic image/video understanding in many applications, such as image/video retrieval, video abstraction/ summarization, adaptive content delivery, image/video retargeting, active vision, etc. In this project, we present a multi-cue attention model for video. After video parsing using shot boundary detection, various attention cues, including image salience, motion salience and face salience, are extracted from video and integrated into a unified attention map. In particular, we adopt a contrast-based image attention analysis method to obtain visual image salience, a foreground extraction based algorithm to extract motion salience and an AdaBoost-based algorithm to detect face in video. Based on our video attention model, we propose a new application, movie retargeting algorithm, to transform wide-screen version movies into full-screen version ones.
For more information, see the web page.
View morphing from two basis views is implemented and then extended to synthesize new views by taking multiple views from a linearly translating camera. Given two views of an unknown rigid scene, with no camera information known, new views from a virtual camera at viewpoints in-between these two basis views are synthesized if monotonicity constraint is satisfied. A three-step algorithm was implemented to achieve the synthesis, which includes prewarping the two basis views, computing a morph between the prewarped views, and postwarping the in-between view produced by morphing. For case of taking three input basis views, each pair of basis views determines new views along the line segment in-between them. And the generated views combine another basis view to synthesize new views in the interior of the triangle which is permitted by optical centers of the three basis views. More basis views determine a convex hull of viewspace. New views along the lines in-between basis views can be produced with general view morphing; those on the triangle surfaces can be done with extended view morphing based on three basis views; and for those interior points, one more morphing need to be done combining a synthesized view on the triangle surface and another basis view.
For more information, see the web page.
In this project, fast algorithms are developed to efficiently extract Local Binary Pattern (LBP) histograms. Some properties of LBP and its extensions are discussed. Several possible distance metrics that can be defined over the space of LBP histograms are studied in the special case of face images. Experimental results show that the use of LBP histogram as feature list to do face clustering, classification or recognition may be very sensitive to the chosen metric. This preliminary study suggests ways to improve the performance of algorithms that study faces using the idea of LBP.
For more information see the web page.
The focus of this project is an implementation of the Efros and Freeman "Image Quilting" Algorithm. The algorithm is designed to solve the texture synthesis problem in a unique way. It proposes to tile together patches of an input image in order to produce the output image. The novelty of the algorithm is in allowing neighboring patches to meet at edges that are not simply straight lines. I will specifically discuss the parameters that control the behavior of the algorithm and the experimental results that I obtained when implementing according to the original paper. After this I discuss some of the common problems that appear when applying the algorithm to structured and semi-structured textures, my thoughts on why they appear, and my proposals for how to compensate for them. I then discuss the experimental results obtained from implementing some extensions with specific emphasis on how effective the extensions were at achieving their intended goals. Finally I offer some concluding remarks on the algorithm itself, as well as on texture synthesis in general.
For more information, see the web page.
The main purpose of this project will be to try to recognize what emotions are human faces expressing. To achieve this, I will perform a fast survey of the techniques currently existing, and one of them is to be implemented. Also, I would like to try to implement a classifier based on the images generated by convolving the image of the face with an edge-detector filter, and then split up this image to try to obtain more significant regions, which will have more or less edges (i.e. folds in the face) depending on the emotion that face is expressing.
As digital photography becomes more accessible, it follows that the average individual will have an increasing number of images to manage. Automated methods of identifying and classifying faces would be of significant value to individuals searching through such image collections. In my project, I plan to implement one such method of image annotation. This method will employ a Bayesian similarity measure. To account for missing features, a marginal probability will be used so images will be compared to each other without a bias induced by differing feature counts.
For more information see the web page.
The central part of the project is to perform geometrical transform on document images taken from digital cameras. Since (1) the image is taken from a distance, thus undergoing perspective transformation, and (2) the book is not pressed against a glass surface as in scanners, which means the pages can be curved along a cylindrical surface, we need to do geometrical restoration before optical character recognition (OCR) can take place.
The process is self-calibrating; only one image is required. The required information is extracted by analyzing the geometry of text lines (it cannot be performed on pages with pure graphics), estimating the model of the page's surface and the effect of perspective transform, and warp the input image so that the page becomes upright again.
A number of other image restoration techniques are applied during the analysis of text lines. The most important one being the thresholding / binarization of documents (converting the 256-level image to bi-level). Since images captured from digital camera will suffer from uneven illumination, the histogram may not be bimodal, therefore we will be investigating the use of local / hybrid thresholding techniques in thresholding.
We will not be doing the OCR part. Instead we will use the Presto OCR software found on CAE workstations. We will conduct experiments to find out the improvements in OCR recognition rate gained by performing our restoration algorithm.
References:
Z. Zhang and C. L. Tan, Restoration of images scanned from thick bound documents, Proceedings 2001 International Conference on Image Processing, Volume 1, 2001, 1074-1077.
H. Cao, X. Ding and C. Liu, Rectifying the bound document image captured by the camera: A model based approach, Proc. Seventh International Conference on Document Analysis and Recognition, Vol. 1, 2003 71-75.
The shapes of many objects in real world have curved contours. In this project, we will make use of the epipolar geometry to reconstruct the shape of symmetric curves in 3-D space and recover the camera pose as well from their 2-D images, even without known point correspondences. In theory, planar symmetric curves can be reconstructed from a single view, while general symmetric curves can be reconstructed from two views. In this project, we will focus on the recovery of planer curves from a single 2-D image by minimizing the L2 distance between the shapes of the curves reconstructed via the epipolar geometry of symmetric curves. Results are shown with future work discussed.