CS 766 | Fall 2006 |
In order segmenting to be useful in most real world applications, it needs to be fast and robust. That means that it shouldn't need individual parameters set for each image, a pre-determined number of segments, or the assumption that no two objects have the same color. I will describe and implement a robust segmenter by improving methods worked on by D. Scharstein and R. Szeliski. The image will be iteratively blurred, increasing the differences between segments while decreasing the differences inside a single segment. The resulting image will then be segmented. This segmenter will try to improve and increase the speed of the blurring with larger windows and smarter rates of blurring.
Active binding site locations in protein molecules can be characterized structurally through pockets and cavities present on the molecule’s surface. By fitting quadratic surfaces over a protein’s representative mesh and varying the locality for computing the spline coefficients, a fast approximation of the object's curvature can be constructed. This provides a computationally quick and efficient way to evaluate the surface for areas of depressions with variable size. By grouping the neighboring depressions, it is possible to generate estimations for potential ligand binding sites. For more information see the webpage.
Good information design depends on clarifying the meaningful structure in an image. My aim is to devise a computational approach to stylizing and abstracting photographs that explicitly responds to this design goal. Creating a good abstract of an image requires the accentuation of the main aspects of the image and deminishing the irrelevant parts. Doing this correctly reduces the perceptual and cognitive effort required to understand the image. The problem is, it is hard from image pixels alone to figure out the main aspects of the image. Part of the problem is that different people focus on different parts in an image, depending on what they are looking for or expecting.
For my project, I will be creating and implementing an algorithm for stylizing an image that not only takes the image as an input, but also takes the runs from the Peek-a-Boom game. This has the major advantage of being able to figure out the important areas in the image. The algorithm incorporates this information in the stylization process. Most previous stylization approaches (for example, [Haeberli, 1990]) did not take into account perception information. Some approaches (for example, [DeCarlo, 2002]) used eye tracking information as an aid when constructing their illustrations. Hertzmann (see [Hertzmann, 2001]) let users sketch over the important areas in a given photograph.
There are several applications for this approach. The most prominent one is image search engines. Instead of displaying thumbnails for the results, the results can be displayed as illustrations. Those illustrations contain less information overall (while still keeping the important information), and thus would be easier to fathom and more pleasing to look at. Another possible application is merging objects from different pictures minimizing the loss of details.
References:
1. D. DeCarlo and A. Santella, Stylization and abstraction of photographs, Proc. SIGGRAPH '02, 2002.
2. L. von Ahn et al., Peekaboom: A game for locating objects in images, Proc. SIGCHI Conference on Human Factors in Computing Systems, 2006.
3. P. Haeberli, Paint by numbers: Abstract image representations, Proc. SIGGRAPH '90, 1990.
One essential criterion for any autonomous vehicle is that it not collide with objects in its environment. If the system is to use image-based control, the most important information to extract from the input images is depth, so that the algorithm can prioritize which objects to avoid (by steering away from them). Thus, stereo vision is a desirable feature of an autonomous vehicle, but the real-time nature of the navigational challenge limits the stereo techniques available for implementation. This project sought to implement a real-time stereo algorithm with definite occlusion labeling proposed by a Microsoft Research Group in 2003, referenced in [3]. Then, as a first, simplistic step towards using this data for navigation, the algorithm was to calculate, for each frame, the optimum direction in which to steer a vehicle to avoid obstacles detected in that frame. This was accomplished by finding the optimal x-coordinate that maximizes distance from significant areas of foreground, as defined by a threshold on the depth map produced from the stereo reconstruction of the scene. The implementation described herein failed to reproduce the real-time characteristics of the Microsoft algorithm. In fact, there was only enough time for one run of the algorithm on one frame, so a fatal bug that destroyed all information exchange between the components of the program was not discovered until it was too late to run the program again. While no data on the output of the algorithm is presented, several comments on the running time of the algorithm and possible improvements to a working version of the code are presented. For more information, see the webpage.
Medium-resolution satellite imagery is freely available for most of the land masses of our planet. Digital elevation maps are also readily available. In this project, we propose a method for synthesizing a realistic satellite image for a land form based on that land form's elevation map. The synthesis uses the elevation map and satellite image of a separate land form as an example from which to build the new image. This work builds on existing techniques in texture transfer and so-called "image analogies." The technique may be applied anywhere realistic aerial photos are called for, such as games or flight simulators. For more information, see the webpage.
The problem of detecting objects with optional or repeated parts usually requires exhaustive searching for each possible layout independently. In this project, I will implement a method which instead models variable structures with a set of possible states and their relations to each other, allowing a single structure to represent an entire class of objects to be detected. Detected feature correspondences are compared with state relations in the model, and a globally optimal correspondence is sought. The method is studied in depth using the example of leaves on a stem, the number and location of which are not known a priori.
This project is focused on the implementation of a paper from this year's Siggraph conference entitled "Fast Median and Bilateral Filtering" by Weiss. The primary contribution to computer vision that this paper proposes has actually little to do with the inner workings of median and bilateral filtering, but rather the method of which the window of operation is moved along the image. More specifically, it extends previous work from the 1981 paper by Huang entitled "Two-Dimensional Signal Processing II: Transforms and Median Filters." Because of how median and bilateral filters work, one cannot simply apply a two dimensional kernel to the image data set and move that kernel around to produce results, instead one must consider each pixel in the window of size (2*radius+1)2, and because of that some amount of thought must be put into how to best move this window around with as little computation as possible. Weiss figures by computing multiple columns at once and sharing a central window with nearby columns, less redundant calculations are done compared to Huang's algorithm. Huang's algorithm may be extended to multiple columns but will have many overlapping windows containing the same data. By solving for the best number of columns to compute at once, a speed increase over Huang's original and multi-column approach is realized. For more information, see the webpage.
Video game designers are currently pioneers in the world of using exciting technologies to create ways for a user to communicate to a computer. The Nintendo Wii features an innovative input remote that the gamer can swing, slash, point, and jab with. It is feasible to imagine that a video game designer could promote a new sort of input device that allows a user to use body movement and orientation (in other words, body language) as communicate effortlessly to the computer. This input device is a dual-camera rig with an appropriate stereo-vision and interpretation software designed to figure out the body orientation. I have started the work in proving that it is feasible for this device to gather useful information from the rig at a frame rate acceptable for use in a situation that requires seemingly instantaneous response. I wrote a program to pull in images from two webcams and quickly compute the strong edges for use as features to match in the two images. Further work will involve writing the software to compute a depth map and to identify the positions of at least the hands, torso, and head to provide as input to the operating system and other processes, as well as researching various hardware speedups to the whole process.
Efficient, interactive foreground/background segmentation in still images is of great interest in computer vision. An approach based on optimization by graph-cut has been developed recently which combines both texture and edge information. In this project the method proposed by C. Rother et al. [1] is to be implemented, which is an extension to the original graph-cut. In this method a more powerful, iterative version of the optimization is developed; the power of the iterative algorithm is used to simplify substantially the user interaction This method is expected to be successful on a wide variety of moderately difficult images. For more information, see the webpage.
[1] C. Rother, V. Kolmogorov, A. Blake, "GrabCut" - Interactive Foreground Extraction using Iterated Graph Cuts, Microsoft Research Cambridge, UK.
Summation Invariants are a new class of image feature developed recently by Lin. They were shown to be invariant to group transformations, e.g., a Summation Invariant derived under Euclidean transformation is invariant to Euclidean transformations. These invariant features were applied to the problem of face recognition. This project involves deriving a new family of Summation Invariants using the technique developed by Lin. This derivation is documented in detail. The new invariants are tested on a 2D curve over several Euclidean transformations. This test validates their invariance. Since invariance doesn't necessarily imply good discriminating ability, experimentation is done to test the discrimination performance of the new Summation Invariants. The new features are compared with each other and with some of those previously derived to see how well they can discriminate between different faces. The performance of the new family of Summation Invariants is very similar to that of the set previously derived.