CS 766 HW #4: Your Own Project

CS 766

Computer Vision

Fall 2006

HW #4: Your Own Project
Tentative Title and Abstract Due: Tuesday, November 21 (submit electronically)
Class Presentations: Tuesday, December 12 and Thursday, December 14
Final Paper and Web Page Due: Tuesday, December 19 at 5pm

The last assignment is to do a project based on an area of computer vision of your choice. Projects can be either individual or in teams of two. The intended effort per person is approximately 2 times that of a previous homework assignment. You can choose to do any combination of programming, reading relevant technical papers, and developing new theory or techniques. Except in unusual circumstances, a project will involve reading some papers from the literature, implementing some method(s), experimentally testing the method(s) on appropriate images, and writing a report that describes the problem, the approach implemented, a summary of experiments, and evaluation of results. The length will depend a lot on the type of project. Most reports will be about 15 pages long. The style should be in the form of a conference paper. That is, title, abstract, introduction, motivation, problem statement, related work, theory, method, experimental results, and concluding remarks.

If your project is based primarily on a published paper, include a brief summary of the method's major steps and ideas, emphasizing key points that you want to stress about it. Also, include a clear description of assumptions of the method and a list of all the parameters that must be specified by the user.

Be sure to clearly state what parts of your code you wrote yourself and which parts you got from elsewhere, citing sources of the code you did not write and the form of the code (e.g., from a MATLAB toobox or from an individual's implementation). Citing sources for code you've used is as important as citing publications!

If you're doing a project that's primarily programming, first develop a specific set of operations to be performed and tested. If you're doing little or no programming, you can first focus on selecting a set of readings in an area. An extended survey would be okay as a project if there is a large enough set of related papers compared in a detailed way. Just abstracting a set of readings is not sufficient. Ideally, I would prefer some combination of reading and original thinking and original work that could consist of a simple extension of, combination of, or theoretical analysis of previous work.

What to Hand In

Due Tuesday, November 21: A tentative title and abstract of your project, e-mailed to the instructor (dyer@cs.wisc.edu). These will be posted on the class web page for your interest.

Due December 12 and 14: Class presentation summarizing your project. Include (1) problem statement, (2) motivation, (3) summary of method, and (4) results. Each presentation will be about 10 minutes.

Due Tuesday, December 19 at 5 p.m.: Hardcopy of your final project report. In addition, please e-mail the instructor (dyer@cs.wisc.edu) a final title and abstract of your project. Submit your source code electronically to the handin directory for hw4. Also, create a web page for your project and email the link to it to the instructor.

Some Project Topic Suggestions

The following are some sample ideas for topic areas. These could be used to stimulate your own thinking about areas of interest and lead to a narrower, more specific project. I much prefer in-depth treatment of a narrow topic over shallow treatment of a broader problem. I encourage you to talk to me about ideas that you are thinking about early on in order to help you focus your efforts. Finally, there are digital still and video cameras available if you need to digitize some images for your project. Note: I will add more ideas to the top of the following list as additional ideas come to me.

Multiperspective Image Representations for Object Recognition
Encode appearance (e.g., image patches extracted using an interest operator) and geometry (e.g., the silhouette contours) all around an object (e.g., a person's head or a vase) and then use this representation to recognize an object in an input image or input video.
Image Synthesis of Noun Phrases
Given a simple noun phrase such as "Five big red balloons" synthesize an image that depicts that phrase.
Visualizing Proper Nouns
Given a proper noun, use an existing search engine to collect a set of images and then select/compose/modify one or more of them to create an output image.
Depth from Airlight
The fact that distant mountains look bluish and the further away they are, the bluer and brighter they appear is called "airlight," which is a type of "aerial perspective" that painters such as Leonardo da Vinci have known about since the Renaissance. Implement a method for recovering relative depth of outdoor scenes using this property. This and other interesting effects are described in the wonderful books (1) M.G.J. Minnaert, Light and Color in the Outdoors, and (2) D.K. Lynch and W. Livingston, Color and Light in Nature.
Photo Tourism
Implement a version of the Photo Tourism project at the University of Washington and Microsoft.
Content Based Image Synthesis
Implement a version of the paper "Content Based Image Synthesis by N. Diakopoulos et al. and on the project web page.
Automatically find good prototypical images for words
Given a word, image search engines return a ranked list of images that have that word in a caption or as part of the file name, but the top images are frequently not very typical of the word. Devise a way to rerank these image results so that the top 1-3 images are good prototypes that might be used as part of a picture dictionary. For example, compute a set of image patch features, e.g., using the SIFT operator, and then cluster the top 100 images returned by the search engine to find those that are most similar to one another and, perhaps, are also relatively close of other, related words or concepts. Word similarity could be determined using WordNet.
Automatically select local features for object recognition
SIFT and Harris features are good, distinctive features, but there they are not selected based on how well they may describe an object in order to perform object recognition accurately. Adapt the ideas used in defining keypoint and corner detectors so as to produce a good set of features for recognition. Or use data produced by people who've played the Peekaboom game.
Use manually-labeled datasets of images for a novel task
Recently a number of online games have been created that ask users to manually label images or segment images in order to collect a lot of training data for use by vision algorithms. Devise an interesting task that can make use of one of these sources of data. Some of the games and their datasets are:
- Esp game with collected datasets from it available here and here.
- Peekaboom game and collected data from it
- LabelMe game with associated data available from this site
Family face recognition
Given a set of training images of the faces of about 10 known people (e.g., your family), recognize those individuals in testing images and also distinguish them from other unknown people, who you don't care who they are.
Video epitomes
Implement a version of this new idea and use it for a task such as those suggested at the author's web site.
Object category recognition by unsupervised learning
Implement, extend, and test a probabilistic method for describing and recognizing object categories, e.g., one of the methods by Rob Fergus, as described at his web site.
Where am I?
See the problem description and get data at http://research.microsoft.com/iccv2005/Contest/
Dynamic object recognition
Develop an object recognition algorithm that takes an input video of a rigid object that is moving in front of the camera and uses the dynamic information to recognize it as one of a library of known objects, that are given in a set of training videos.
Detection of dilated bronchi in high-resolution CT images
Evaluation of skeletonization methods for describing and matching brain white matter images
An online "game" for obtaining manual data from users on some aspect of image analysis
3D Scene Reconstruction of Historic Structures from Paintings
Given a set of images of paintings of structures that no longer exist, implement and use Tomasi and Kanade's factorization algorithm to do 3D scene reconstruction of historic buildings.
Robust Estimation for Correspondence in Wide-Baseline Stereo
In wide-baseline stereo it is very difficult to determine enough good correspondences to use standard robust estimation methods to estimate a good fundamental matrix. Investigate ways of enabling a robust method could be used in this problem domain.
Segmentation of Brain Cortical Surface
Given a triangular mesh representation of the inner and outer cortical surfaces of a brain, segment the surface into clinically-meaningful "lumps." See work by M. Chung, for example.
Change Detection in Brain Surface
Given two triangular mesh representations of a brain's cortical surface at two different times for a single person, compute a vector field that describes how points on the first surface have plausibly moved to the second surface. See work by M. Chung, for example.
Shock Graphs and Divergence-Based Skeletons
Implement a more complex skeleton detection algorithm, such as Kimia's shock graph or Siddiqi's divergence-based skeletons, and use it for 2D or 3D skeletonization and possibly for object recognition.
Reconstruction of 3D Textures
Image-based reconstruction of 3D textures such as cloth, hair, or trees from a set of images. Perhaps using a Facade-like approach (see work by P. Debevec).
Texture Synthesis
Implement one of a number of recent methods for synthesizing new images of a texture from an existing texture image. See Chapter 11.3 of the textbook and A. Efros's web page for links to a number of recent methods.
Video Textures
Implement this recent method or a new variant of it.
Motion-Based Segmentation
Segmentation of video. See, for example, the method by Kuhne et al.
Level Set / Fast Marching Algorithm
Implement one of these algorithms and apply to a computer vision problem. See Sethian's web page for more information.
Correspondence from Widely Separated Views
Write an algorithm for determining corresponding points in a pair of images taken from cameras that were far apart and unknown relation between them (i.e., uncalibrated cameras). Implement or use as a starting point the algorithms given in the paper "Wide Baseline Stereo Matching," Proc. 6th Int. Conf. Computer Vision, 1998, 754-760. (See also "Matching and Reconstruction from Widely Separated Views" for a successor paper.) Also see "Wide baseline point matching using affine invariants computed from intensity profiles," Proc. 6th European Conf. Computer Vision, Part I, 2000, 814-828.
Reflectance Modeling from a Sparse Set of Views
Given a set of calibrated input views of a scene with unknown reflectance for each of the visible scene surfaces, estimate BRDF reflectance models for each distinct surface. See, for example, the work on inverse global illumination by Y. Yu.
Reconstruction from Silhouettes of Circular Motions
Implement the method described in the paper "Reconstruction and Motion Estimation from Apparent Contours under Circular Motion". See also a demo of this method.
Recognition of Activities using Temporal Templates
Implement the technique for recognizing activities described in the paper by J. Davis and A. Bobick. See also online demos.
Video Analysis
Devise some problems where evaluating long image sequences is required to detect certain events. For example, if a given subsequence is known to be of interest (e.g., it shows a prototypical activity of some kind), develop a method for matching this subsequence with segments of the long input sequence to find positions where the subsequence has a good match.
View Morphing Dynamic Scenes
Develop an extension of view morphing to handle scenes containing moving objects. For example, a change in viewpoint of the entire scene plus changes in pose of a single moving object within the scene. Assume the moving object can be segmented and optical flow information is also extracted.
Real-Time View Morphing
Develop an automated version of view morphing and implement on the SGI machine to achieve real-time (i.e., 10-30 frames per second) performance.
View Morphing Heads
Develop methods for adapting view morphing to apply to two images of a person's head (say taken by two or three cameras mounted around your workstation) so as to render new views of that person. Could be used in applications such as videoconferencing. You can build on our existing view morphing code. Main extension would be to develop an automatic line correspondence method for the two views of a face, based on a simple model of the head and face.
View Morphing from Multiple Views
Extend view morphing to take either three views as input, or else a sequence of views from a linearly translating camera, and combine them for synthesizing new views. You can build on our existing view morphing code.
Develop View Morphing or Other View Synthesis Method for Interactive 3D Viewing over the Internet
That is, define an application of view synthesis and explore implementation and other specific issues in order to make real-time interactive viewing/manipulation feasible for that application.
Voxel Coloring using Variable Resolution Voxels
Develop a version of the Seitz and Dyer Voxel Coloring algorithm (as presented in class, and there is a paper available) that uses a variable resolution representation of the scene voxels so as to more efficiently reconstruct and color the voxels in the scene.
Voxel Coloring from Uncalibrated Cameras
Rather than coloring voxels in Euclidean space, color voxels in another representation that avoids need for fully calibrated cameras.
Devise a Novel Camera Setup
that includes one or more cameras, perhaps movable on a simple rig, that would enable the rapid capture and processing of images for view synthesis type applications. In other words propose a hardware configuration for use in view synthesis or other domains. An analog to this kind of thing is the OmniCamera at Columbia University.
Mosaicking Extensions
Add extensions for mosaicking images. For example, handling non-static scenes by including a residual term as shown in the video in class. See the paper by Irani and Peleg (cited in the Supplementary Reading web page) for ideas. There are many references to papers on this subject in the Supplementary Reading web page.
Registration for Image Mosaics
Use an optimization solver to completely solve or to refine an initial manual registration for an image mosaic.
Solve the "Hole" Problem
in the Chen and Williams view interpolation paper.
Snakes for Image Databases
Investigate ways of (efficiently) searching a database of images for objects that are similar to a user-specified contour, which acts as the initial contour for a snake-based measure of similarity searching in the database. Precompile information for each image in the database that will enable an initial snake to be compared with that image.
Snakes for Skeletonization
If you enjoyed the homeworks on skeletons and snakes, try combining them! Implement the skeletons-from-snakes algorithm as described in the paper by F. Leymarie and M. Levine, "Simulating the Grassfire Transform using an Active Contour Model," IEEE Trans. Pattern Analysis and Machine Intelligence 14, 1992, 56-75.
Intelligent Scissors
Implement the interactive, snakes-based method for image segmentation described in "Intelligent Scissors for Image Composition," E. Mortensen and W. Barrett, Proc. SIGGRAPH 95, 1995, 191-198.
Use the Reflectance and Texture Image Database of Real-World Surfaces
This database contains many images of real textured surfaces under controlled illumination and viewing conditions. Devise experiments that utilize this data for view synthesis, texture analysis, shape from shading, etc.
Layers
Implement the layers representation of video described originally in "Representing Moving Images with Layers," J. Wang and E. Adelson, IEEE Trans. Image Processing 3, 1994, 625-638. (See also: "Layered Representations for Vision and Video," E. Adelson, Proc. IEEE Workshop on Representation of Visual Scenes, 1995, 3-9.)
Straight Line Detection
Implement the algorithm for straight line detection by Burns, Hanson, and Riseman, as described in the paper "Extracting straight lines," IEEE Trans. Pattern Analysis and Machine Intelligence 8, No. 4, 1986, 425-455.
Web Image Search Engine
Define an analog to the text-based web search engines that applies to images. It may be best to restrict this to specific types of images and then develop the search engine around a restricted set of operations that the user can ask about these images. Examples of such search engines include WebSeer (University of Chicago) and WebSEEk (Columbia University). Also, see the Scientific American article on this subject.
Vision for Human-Computer Interfaces
See the paper by Quek cited in the Supplementary Reading web page for ideas related to HCI and vision, and then develop your own ideas or implement a simple prototype of some kind.
Looking at People
See papers on human gesture and face recognition cited in the Supplementary Reading web page and develop your own application, or implement a technique related to one of those papers. See also the list of online papers on face recognition.
Non-Model-Based Feature Tracking
Implement a technique capable of tracking some type of image feature over several frames of an image sequence. Possible types of image features include point-features like corners (e.g., Tomasi, Moravec), contours (e.g., Kass, Blake), a region or group of pixels (e.g., Rehg & Witkin), or optical-flow data (e.g., Allmen).
Tracking Contours across "Event" Boundaries such as when contours split or merge.
Line Segment Tracking using Deriche and Faugeras's extended Kalman filter method. Alternatively, some other model-based tracking algorithm could be implemented.
Vision Databases
Efficiently processing queries involving geometric/spatial data, perhaps represented at multiple scales. Query-by-shape. Query-by-x. Consider especially issues related to queries and data that represent 3D objects, not simply 2D images or 2D features.
Vision and Learning
Learning object models of non-rigid objects, for example. Learning image-based representations. Active learning of shape or x.
Vision and Videoconferencing
Recovering purposive information about participants; maintaining models of faces, hands, etc. associated with the flow of discourse.
Recognizing Melanoma
Develop methods for representing critical features for diagnosing melanoma. Images in /p/vision/images/melanoma show the types of images that must be analyzed.
Active Vision
New ideas for recovering some kind of scene information from an active viewer (you specify what the "active" viewer parameters are and what kind of task-specific information needs to be recovered).
Visual Exploration
Methods for using vision to guide a mobile robot in "exploring" an unknown 3D surface. That is, vision-guided navigation problems.
Depth from Focus
Reading and analysis of methods for estimating depth from measurements of the relative blur.
Focus-of-Attention Mechanisms
That is, how to control a sequence of processing steps which each operate on a selected subset of the image data. This could be associated with a specific task for homing, tracking or altering which could be made more specific so as to characterize the criteria for adaptive, data-dependent sequencing over the data (possibly in space, resolution, and/or time).