CS 766 | Fall 2005 |
Temporal localities from streams of images can be used to provide extra information about a scene, which can be used to improve the signal to noise ratio, replace missing images, and to improve resolution. One such approach for using this takes advantage of both temporal and spatial similarity, called video epitomes, appears in recent work by Cheung et al. [1]. We implement the algorithm presented in the paper and describe the difficulties associated with doing so. The problems of choosing appropriate parameters for the algorithm are also examined. Finally, we discuss the issues involved with the implementation and the subsequent performance. For more information, see the web page.
1. V. Cheung, B. J. Frey, and N. Jojic, Video epitomes, Proc. Conf. Computer Vision and Pattern Recognition, 2005.
Automatic image classification has many practical applications, including photo collection organization and image search. In this project I approach the image classification task through the use of image epitomes. The epitome of an image is a probabilistic model that serves as a compact representation of the shape and texture information present in the original image. I first create a collage of positive and negative example images, then I generate the epitome representation of this collage. Patches in the epitome which are much more likely to be found in positive images than in negative ones can then be used to classify new images. For more information, see the web page.
This work proposes a method for detecting possible ligand binding sites on proteins and comparing them to existing binding sites. The interaction between proteins and small molecules is critical to understanding a protein's role in a biological system. High throughput structural genomics is now providing structural data for proteins before other work has been done to determine their function. We use ambient occlusion information computed over the surface of a protein to find candidate binding sites. A search of the rotation space coupled with the iterated closest point algorithm is used to align potential binding sites with existing ones for comparison. For more information, see the web page.
By comparing amounts of blur in an image with shallow depth of field, users can often distinguish the relative depths of objects in a scene. I will investigate current methods for applying this insight to the recovery of depth information, and at the same time discuss how depth-from-defocus can do a better job than depth-from-stereo-pairs, especially in the presence of a wide baseline and occlusion. To that end, I will build a system that identifies relative blurriness between two images taken with the same extrinsic parameters (but differing intrinsic parameters), correlates it with depth, and uses that to construct a depth map.
Texture synthesis is the process of creating new textures by extracting information from existing textures. The primary objective of texture synthesis is to take an input texture as a sample and based on this sample image, generate a possibly unlimited amount of data. The data generated should result in new textures perceived to be similar (but not exactly the same) as the original texture. In this project the image quilting algorithm, developed by Efros and Freeman (2001), was implemented. Finally, the ideas generated in this image quilting algorithm were applied for face merging. For more information, see the web page.
Clustering, in simple words, is grouping similar data items together. In the text domain, clustering is very popular and fairly successful. In this project we apply clustering methods that are used in the text domain to the image domain. Two major challenges in this approach are image representation and vocabulary definition. We apply the bag-of-words model to images using image segments as words.
We use the Latent Dirichlet Allocation (LDA) to model the relationships between "words" of an image, and between images. This provides us with a highly compressed, yet succinct, representation of an image, which can be further used for various applications such as image clustering, image retrieval, and image relevance ranking. In this project we have used the relationships obtained from LDA to cluster the images with 78% success.
Under poor weather conditions (rain, fog, etc.), both the contrast and color of an image will be degraded. To reverse these effects, spatially adaptive algorithms have previously been developed to improve images under these conditions. Due to the scattering of light, the degradations of an image will be proportional to the distance from the point where the image was taken -- this requires de-weathering algorithms to acquire depth information extracted from the image. Previously implemented methods [1] are limited, in that an interactive step requiring the user to select both the horizon color and the vanishing point of the image was needed. Using new techniques in geometric data representation (curvelets), an automatic de-weathering algorithm has been developed. For more information, see the web page.
1. Narasimhan, S., and Nayar, S., Interactive deweathering of an image using physical models, Proc. IEEE Workshop on Color and Photometric Methods in Computer Vision, 2003.
Over the years a variety of methods have been introduced to remove noise from digital images, such as Gaussian filtering, anisotropic filtering, and total variation minimization. However, many of these algorithms remove the fine details and structure of the image in addition to the noise because of assumptions made about the frequency content of the image. The non-local means algorithm does not make these assumptions, but instead assumes the image contains an extensive amount of redundancy. These redundancies can then be exploited to remove the noise in the image. This project will implement the non-local means algorithm and compare it to other denoising methods using method noise measurement. For more information, see the web page.
Object recognition is one of the most interesting areas in computer vision. There are three main issues in object recognition: representation, learning, and recognition. For this project, graph-based abstraction of Blum's skeleton has been used for qualitative shape image representation. The Blum skeleton curve is obtained using the shock graph following the robust and efficient skeletal graphs by Dimitrov, Phillips and Siddiqi [1]. We also introduce a method to construct a graph from a given skeleton image. For the learning method, we apply k-medoid clustering to find possible clusters on training data for each category [2]. The recognition of a new object can be done by finding the closest cluster to this new object, and use that cluster label as the new object's label. Cross-validation on the training images gives 84.44 percent accuracy at distinguishing three image categories of shapes: cell phones, clothing, and stars.
1. P. Dimitrov, C. Phillips, and K. Siddiqi, Robust and efficient skeletal graphs, Proc. Conf. Computer Vision and Pattern Recognition, 417-423, 2000.
2. T. Raymond, and J. Han, Efficient and effective clustering methods for spatial data mining, Proc. 20th VLDB Conf., 1994.
To achieve robust control of industrial crystallization processes, it is necessary to measure the sizes, shapes, and polymorphic forms (i.e. internal structures) of the developing crystal population. This paper describes an algorithm that uses model-based object recognition to automatically extract crystal size and shape information from noisy, in situ crystallization images. The effectiveness of the algorithm is demonstrated by comparing its results with those obtained by manually sizing the crystals. For more information, see the web page.
I address the problem of recovering the camera radial distortion coefficients from one image. The method I implemented is called radial distortion snakes [1]. I encountered some problems and tried to resolve them. Edge detection is added to the method. Experiments show that adding edge information improves performance.
1. S. Thirthala and M. Pollefeys, The radial trifocal tensor: A tool for calibrating the radial distortion of wide-angle cameras, Proc. Conf. Computer Vision and Pattern Recognition, Vol. 1, 321-328, 2005.
Automation of plant contour tracking could be very useful to plant genetics labs studying the effect of plants' genes on their response to light. Currently, very few labs automate the detection and tracking of these contours, but rather process them by hand. We created an algorithm that will automatically generate ordered root contours from an image edge map. Unfortunately, this simple algorithm can not handle the more general problem of all plant contours. We also have created a theoretical algorithm that will use prior contour probabilities to find Arabidopsis root contours in new data. Our second algorithm uses the general ideas of factored sampling and Condensation as our inspiration. For more information, see the web page.
Bronchiectasis is an abnormal destruction and dilation of the medium sized bronchi. The condition is divided into three classes: cylindrical which involves a slight widening, varicose which involved both widening and collapse, and cystic in which the bronchi balloon out. The CT scans are horizontal slices through the body and are maps of density changes in the body tissues. The scans are far enough apart vertically that the use of successive scans to extract 3D data is unlikely. Given that, the structures, while variable is size, orientation, and shape exhibit a high degree of symmetry. I use Otsu's method and image morphing to extract the image lung tissue from the CT scan. I then use central moment transforms on windows defined about the dense (bright) areas of the lung to find and characterize bronchi with the characteristics of Bronchiectasis.
Often it is not possible, or practical, to obtain an image with the desired view. This is a common problem in sports where the "ideal" view may move much to fast for a camera to follow, or where the camera would interfere with the action. In this paper a method using stereo vision is exploited to obtain new views of a scene. Stereo vision is used to construct a crude 3D model of the scene. The images from the input images are then mapped onto this model. New views of the 3D model are then computed and presented. Additionally the 3D properties of the model are exploited to adjust the transparency of some objects.
In many vision algorithms there is some form of detecting key points in one image and trying to find the corresponding point in another image. Often we are given something like the KLT Tracker and told to use it as a "black box" for detecting and tracking features. For this project, I will figure out how to perform feature detection and feature tracking with my own algorithms. Once the features are detected and tracked from one image to the next, I will use those tracked features to mosaic the images. The nice thing about the algorithm I developed is it doesn't require the images to be overlapping by a lot. I have an example of two images that are overlapping by about 50% and the algorithm was able to create a good mosaic. The algorithm is scale invariant, rotational invariant, and intensity invariant. More information is available here.
Human faces are very similar with minor differences from person to person. Furthermore, lighting condition changes, facial expressions, and pose variations further complicate face recognition. The Line Edge Map (LEM) method was believed to have higher recognition performance and lower memory requirements compared with previous approaches. In this project, the LEM algorithm will be implemented and used to extract compact face features for face coding, and the Line Segment Hausdorff Distance will be used to perform face recognition based on the extracted features. Different conditions will be tested using the implemented method, for example, varying lighting conditions, facial expressions and poses. For more information, see the web page.
Landform elements play a crucial role in controlling the redistribution of material and energy over terrain surfaces and thus are highly related to geographic processes. Traditionally, landform elements are delineated manually based on qualitative description of surface shape and aerial photographs by means of stereoscopic landscape analysis, which is slow, costly, and error-prone. With the increase of the availability of Digital Elevation Model (DEM) data, automated procedures to segment DEMs into meaningful landform elements are often desired. In this project, an image clustering based automated procedure was implemented to perform the segmentation. This procedure is composed of three steps: (1) selecting and calculating feature vectors, which include the components of elevation (image intensity) and other terrain indices (image texture); (2) performing K-Means clustering for unsupervised image segmentation; and (3) labeling landform segments with corresponding landform element names. Experiments on images of the Pleasant Valley area show that with proper preparation of feature vectors, image clustering algorithms like K-Means can generates surprisingly good results. For more information, see the web page.