CS 766 | Fall 2000 |
We implement and use an active contour model or snakes to compute the skeletonization of objects by integrating both region and boundary features. The resulting skeleton is then compared to the results of more traditional methods of skeletonization such as Medial Axis Transforms. We show that skeletonization by grassfire snakes result in clearer, richer shape representations that are more invariant to certain image transformations.
Photorealistic 3D model reconstruction is a major area of research in computer vision. The voxel coloring scheme proposed by Seitz and Dyer provides a means to reconstruct a photorealistic 3D scene. The reconstruction algorithm relies on discretization of the space into volumetric elements called voxels. The voxels are projected on the image planes of the cameras to assign their colors. A voxel is considered to be a part of the scene if its color is consistent in all the camera images under consideration. This voxel coloring algorithm has some intrinsic limitations. The algorithm works on the basic assumption that the scene is Lambertian and the variation in the color is significant in the scene. In this report a reconstruction technique is considered which does not depend on such underlying assumptions and which relies on an iterative energy minimization scheme. The energy function for this scheme depends on the color and geometric constraints. As a part of the project I also implemented a version of the voxel coloring algorithm.
In order to successfully perform visual servoing of a robot manipulator the control algorithm must account for the robot's physical limitations. These constraints define the robot workspace boundary which must be avoided if stability of motion is to be ensured. The main focus of this research is to extend a previously developed method by which a desired viable object path can be interpolated between two arbitrary poses of a planar object based on extracted image features. This path is defined to be a continuous change in object pose between the initial and final desired poses which avoids the workspace boundary. The extended method developed here involves combining a 3D object model with the parameterization of the 2D displacement transformation or homography of each 3D feature point projected onto the image plane. By decomposing these "point" homographies a set of object path poses can be interpolated in the 3D workspace or projected back into image space. The interpolated poses describe the 3D kinematic displacement screw of the object between the initial and final desired poses. This generated path can be utilized to visually servo a 3D object from one pose to another while avoiding the workspace boundary and maintaining the object in the field of view.
Computer vision and its related disciplines (image processing, pattern recognition) mainly deal with 2D images. Medical images, however, are 3D by nature. To apply well-developed computer vision techniques to medical images, the generalization of the 2D algorithms to 3D is necessary. In this project, we generalize the 2D Kanade-Lucas-Tomasi (KLT) tracker to 3D and implement a 3D KLT tracker called KLT3D using C and Matlab.
The KLT tracker proved to be very reliable for detecting and tracking man-made objects. In this project, we apply both the 2D tracker and KLT3D to CT images and demonstrate that such a corner-feature tracker is also quite reliable with human-body-based images. Both simulation and experimental testing of the algorithm and the code were implemented in this project, and several applications in medical imaging were presented.
This project implements the methods described by Shmuel Peleg and Moshe Ben-Ezra in their paper, Stereo Panorama with a Single Camera. [1] In their paper, Peleg and Ben-Ezra describe a novel way of splitting images from a single camera into a left-eye view and a right-eye view.
For a previous assignment, we learned how to combine many images into a single large mosaic, by computing a homography from each image to the reference frame of the previous one. Peleg and Ben-Ezra expand on this concept in two main ways:
1) By using only a small vertical slice from each image and keeping each slice in it's own reference frame, they are able to increase the field of view to a full 360 degree Panorama. Our mosaicing assignment, by contrast, warped all of the images onto a single frame of reference, so the final field of view could not be larger than the image plane of this frame of reference (180 degrees).
2) By moving the camera in a circle, rather than rotating it about it's optical center, they are able to get different view points for the different choices of slices. That is, a panorama made by taking slices from the left edge of each image will give a right-eye view, while a panorama made by taking slices from the right edge of each image will give a left-eye view. Then one view can be shown in red, and the other overlaid in green, so that a 3D image is seen when looked at through 3D glasses.
The snake algorithm using dynamic programming is implemented in this project to extract and track area-based features in intensity images. Dynamic programming uses a discrete multistage decision process in which all possible positions within the search neighborhood of the related snake points are considered at each iteration, so the global optimization of the energy-minimizing snake will be achieved. After the contours have been detected in one image, it can be tracked in consecutive frames by using the snake points on the contours in the current frame as the initial snake points for the next frame and applying the same dynamic programming process again. Some synthetic and real images are used to test the algorithm. The advantages and disadvantages of the algorithm, and possible improvements are also discussed.
Our project implements computer vision techniques to retrieve 3D coordinates from a single real-time computer video input device. Three main steps are involved with this process. Our steps involve first finding a spherical object in a live video image. Then accommodating for occlusions of the spherical object. And finally reconstructing 3D coordinates from data retrieved in steps one and two. Finding the sphere is handled via color tracking. RGB (red, green, blue) and HSV (hue, saturation, value) color spaces were used alongside both seed-fill and scanline tracking techniques. The second step was handled with a modification of the Hough line-fitting algorithm applied to circle detection (parameters: centerX, centerY, radius). The 3D coordinates were gathered from an orthogonal projection camera model.
The importance of maximum intensity projections (MIP) in medical imaging has been solidly established in the area of angiography. Today, nearly all commercial medical imaging systems have the ability to generate a MIP from a volumetric data set. The inability to determine depth from a single MIP image makes the real-time rendering of MIP especially important for depth cueing. Moreover, the need to be able to distinguish minute features requires high-quality MIP rendering. To these ends, several methods of MIP volume rendering methods are explored and further improvements are suggested.
The SIRDS method of displaying 3D images on 2D displays is implemented, with a simple GUI to change the colors of the images. I optimize the depth resolution, creating better images. Finally, the system is used to make two simple animations to show that it is possible to keep your eyes locked into them. For examples and more information, see http://www.cs.wisc.edu/~riverson/cs766/
In this project, I have implemented a program to generate Single Image Random Dot Stereograms (SIRDS), which are pictures of seemingly random dots when viewed in the correct way, produce the impression of 3D objects. The algorithm used in this project is mainly based on the paper: Displaying 3D Images: Algorithms for Single Image Random Dot Stereograms by H. W. Thimbleby, S. Inglis, and I. H. Witten. This paper presents a new, simple and symmetric algorithm for generating single image stereograms from any solid model. This algorithm also corrects a slight distortion in the rendering of depth, it removes hidden parts of surfaces, and it also eliminates a type of artifact, which is called echo. I have also added a coloring option, so that the user can view the stereogram in any color he/she wants. Generated color stereograms can be viewed in the web page of the project. See also the web page: http://www.cs.wisc.edu/~kosart/cs766/
Hand gesture recognition can be used in many situations. One possible application is to control household appliances without touching any device. For this project , I will implement hand gesture recognition method proposed in the paper "Orientation Histograms for Hand Gesture Recognition" by W. Freeman and M. Roth. Orientation histogram will be used as a feature vector to represent hand gestures. My implementation will be tested on a set of test images after training.
This paper presents the approach to extract straight lines from an intensity image. We implemented the straight line detection algorithm described in the BHR paper. For comparison purposes, we also implemented variants of Hough Transform Straight Line Detection. We show through experiments that the BHR algorithm works better than the Hough Transformation algorithm in straight line detection.
Our project implements a methodology for tracking image curves using parametric B-Spline snakes. We start by reading in an image in one of several formats. Upon reading in the image, we extract out the edge information through the use of the canny edge detection algorithm. Once this edge image has been extracted, we fit a B-Spline snake to the contour through minimizing the energy functional. The necessary energy terms differ in the B-Spline snake implementation from the piecewise linear snake implementation because the B-Spline snake does not have to account for internal energy. The internal energy is handled implicitly by the B-Spline. Our project creates an interface using the FlTk windowing toolkit. Our interface provides the user with great flexibility in viewing the edges of the image as well as each step of the minimization process. Finally, our program allows the user to use either a parametric B-Spline snake or a piece-wise linear snake to compare the efficiency and accuracy of the two. For more information, see http://www.cs.wisc.edu/~tamaram/Snakes.html
This paper studies the standard calibration of an active binocular vision system. The standard method used consists of taking images of a calibration grid. Then matching the points in the calibration grid with the corners found in the image and saving the data to a file. This data is used by Tsai's Algorithm to generate the camera parameters and the transformation matrix from world-to-camera coordinates for each camera. The final step is the use of Li's Algorithm to find the gaze-to-camera transformation for each camera. This paper talks about the implementation of Li's algorithm and methods to improve both algorithms' ability to handle and/or reject errors.
Mosaicing a picture from its adjacent parts is encountered in different fields of research and technology, e.g., photogrammetry, remote sensing, microscopy, etc. The approaches to image registration fall in three major categories: algorithms that use image pixel values directly, algorithms that use features, and algorithms that use the frequency domain. In this paper, I present a way to automatically register images with large displacement and inconsistent illumination. My goal is to mosaic images taken by a confocal fluorescent microscope (Zeiss Axiovert 135-TV). The image registration method presented here uses the Fourier frequency domain approach to match images that are translated and rotated with respect to one another. The images frames are then warped using backward mapping and composed using bilinear interpolation.
I present an implementation of a tool for image segmentation called "intelligent scissors" as presented by Eric N. Mortensen and William A. Barrett in "Intelligent Scissors for Image Composition". The purpose of this tool is to extract objects from digital images with a minimum amount of user interaction by creating a live-wire boundary which "snaps" to, and wraps around the object of interest, based on the image's edge properties. In addition to the live wire, I have added a small additional feature, which uses the existing image information to suggest a completion to partially-defined image segmentations. More information and a Java applet demo are available at http://www.cs.wisc.edu/~noto/scissors/IntelligentScissors.html
Beier and Neely describe a technique for feature-based image metamorphosis in their paper for Siggraph '92 which creates a synthesized "morph" from one image to another in a more realistic fashion than the previously used cross-fade technique. Our project includes an implementation of this technique in Java. The Beier-Neely algorithm accomplishes the effective morph with the use of "feature-lines" defined by the user. The transformation of the feature-line skeleton between the two input images is then used to determine (a) the contributing pixels from the input images for each pixel in the synthesized intermediate images and (b) the effective weights of the values that these pixels will have on the result images. We developed an alternative, simplified equation to calculate the weight each line contributes to a pixel in a synthesized intermediate image. This formula combines a feature-line's length and a pixel's relative position to the line into a single value. Finally, we show how the different feature-line choices affect the resulting synthesized morph and therefore the constraints this method imposes. To see results, go to http://www.cs.wisc.edu/~pitter/cs766/
In computer vision and image processing, the snake or active contour approach is widely used for shape recovery from images. The level set method proposed by Sethian is a more general approach than snakes. The idea of level set methods is simple: given a boundary with a speed function F in the normal direction of the boundary, the goal is to track the evolution of the boundary. All the problems, which can be formulated as a moving boundary process, can be solved by the level set method. In this project the level set approach is used to extract contours from image. For the experiments, see http://sal-cnc.me.wisc.edu/~jian/courses.htm
One of the primary goals of "low level" vision is to extract geometric information from images, like line detection. An idealized goal of line detection is to locate straight lines from the line drawing of the image. Despite the large amount of research, effective extraction of straight lines has remained a difficult problem in many image domains. Brian Burns, Allen Hanson, and Edward M. Riseman proposed a new approach to extract straight line. The algorithm (referred as BHR algorithm) for detecting straight line in an intensity image described in the paper by Brian Burns, Allen Hanson, and Edward M. Riseman ("Extracting Straight Line", IEEE Trans. Pattern Analysis and Machine Intelligence 8, No. 4, 1986, 425-455) will be implemented in this project. To do comparison, we will also implement the Hough Transform for Straight Line Detection. First, edge detection algorithm will be applied, and then is the Hough Transform. Outputs will be compared based on scene recognition, line detection, line localization, and complexity.
Epipolar geometry is a very essential piece of information for uncalibrated stereo. Epipolar geometry is required before any further inferences can be made from the stereo pair. The fundamental matrix encodes all the information that can be gleaned from the images. So getting a good estimate of the fundamental matrix is of paramount importance. This project focuses on a method to estimate the fundamental matrix in a robust manner based on the property of virtual parallax. The main reference for this will be a paper written by Roger Mohr et al.
When trying to measure the color of an object with a video camera, the problem of color constancy arises since the color of an object varies greatly under different illumination. To solve this problem, the following approach is considered: Several images of a reference pattern together with the objects of interest (in this case polymer specimens) are taken. One of these images is defined as the reference image. For this reference image and it's related illumination, we can define the color value of the object. All other images will be normalized to that reference image. For each image that is not the reference image, a transformation between this image and the reference image will be found only by considering the reference patterns of both images. It is then assumed that this transformation is valid throughout the entire image (reference pattern and object). More information is available at http://www.cae.wisc.edu/~stuetzle/
My project implements the intelligent scissors algorithm presented in the paper "Intelligent Scissors for Image Composition" by Eric N. Mortensen. A Live-Wire 2-D dynamic programming graph search algorithm is used for boundary searching. I also provide an interactive tool in windows environment to test this interactive digital image segmentation method. The method used here does not make the image segmentation completely automatic, but it uses simple gesture motions with a mouse, user can interactively select an optimal boundary segment by immediately displaying the minimum cost path from the current cursor position to a previously specified "seed" point in the image so that the boundary can adhere to the specific type of edge currently being followed and wraps around the object of interest. It makes the image segmentation process more efficient and accurate. This project is a windows application written in C++. It is developed in Microsoft Visual Studio C++ 6.0.
In this project, we present an approach to the gesture recognition of a relatively small moving character of interest in a highly noisy but constrained environment. We take as input a video taken by a static camera in a constant lighting environment and a black-and-white upper body sketch of the desired gesture. The output is a collection of identified occurrences of the template gesture. There are three main steps in the process: background finding, isolation of the main character from the foreground in each frame, and the gesture matching against the template. We have achieved some success but there is still ample room for exploration and generalization. Details and demo videos are available at http://www.cs.wisc.edu/~zhong/cs766/proj.htm
A picture is worth a thousand words. With the growing prevalence of digital images and videos, there exists a need for effective filter of visual content. Several techniques have been been studied for using textual and visual information for classification and retrieval of these image databases and several partial solutions have been evolved over the past couple of years. In the quest for a more complete solution, this paper will survey some of the techniques employed by these different systems and implement/evaluate a subset of some of these strategies.
An autostereogram is a way to represent a 3D scene in a single 2D image. Starting with the basic idea of autostereogram, we implement Thimbleby's algorithm for generating single image random dot stereograms (SIRDS). After that, we explore the possibility and features of orthogonal autostereograms and present our general algorithm for generating orthogonal autostereograms. The results and problems of the algorithm are analyzed and a heuristic constraint-relaxation approach is given. The modified algorithm turns out to work much better in the sense of reducing disruptions. The radical problem in orthogonal autostereograms and possible solutions are further discussed.
The project does feature tracking and matching. First, corner features are found and tracked through consecutive frames. The result is then imported into the optimizer RANSAC to find the best transform matrix between frames. Using the matrix, corresponding coarse positions of every feature of the first image are located by transforming coordinates of features from the first image to the second image. Starting from the coarse position, cross-correlation matching is used to find the finer positions for the features.