CS 766 | Computer Vision | Fall 2006 |
HW #4: Your Own Project
Tentative Title and Abstract Due: Tuesday, November 21 (submit electronically)
Class Presentations: Tuesday, December 12 and Thursday, December 14
Final Paper and Web Page Due: Tuesday, December 19 at 5pm
The last assignment is to do a project based on
an area of computer vision of your choice. Projects can
be either individual or in teams of two.
The intended effort per person is approximately 2 times that
of a previous homework assignment.
You can choose to do any combination of programming,
reading relevant technical papers, and developing new
theory or techniques.
Except in unusual circumstances, a project will involve reading
some papers from the literature, implementing some method(s),
experimentally testing the method(s) on appropriate images, and
writing a report
that describes the problem, the approach implemented,
a summary of experiments, and evaluation of results. The
length will depend a lot on the type of project. Most reports will be
about 15 pages long. The style should
be in the form of a conference paper. That is, title, abstract,
introduction, motivation, problem statement, related work, theory,
method, experimental results, and concluding remarks.
If your project is based primarily on a published paper, include a brief summary
of the method's major steps and ideas, emphasizing key points that you want to
stress about it. Also, include a clear description of assumptions of the method
and a list of all the parameters that must be specified by the user.
Be sure to clearly state what parts of your code you wrote yourself and which parts
you got from elsewhere, citing sources of the code you did not write
and the form of the code (e.g.,
from a MATLAB toobox or from an individual's implementation).
Citing sources for code you've used is as important as citing publications!
If you're doing a project that's primarily programming, first develop
a specific set of operations to be performed and tested.
If you're doing little or no programming, you can first
focus on selecting a set of readings in an area.
An extended survey would be okay as a project
if there is a large enough
set of related papers compared in a detailed way.
Just abstracting a set of readings is not sufficient.
Ideally, I would prefer some combination of reading and
original thinking and original work that could consist of a
simple extension of, combination of, or theoretical
analysis of previous work.
What to Hand In
Due Tuesday, November 21: A tentative title and abstract of your project,
e-mailed to the instructor (dyer@cs.wisc.edu).
These will be posted on the class web page for your interest.
Due December 12 and 14: Class presentation summarizing your project.
Include (1) problem statement, (2) motivation, (3) summary of method,
and (4) results. Each presentation will be about 10 minutes.
Due Tuesday, December 19 at 5 p.m.: Hardcopy of your final project report. In addition, please
e-mail the instructor (dyer@cs.wisc.edu) a final title and
abstract of your project. Submit your source code electronically to the handin
directory for hw4.
Also, create a web page for your project and email the link to it to the instructor.
Some Project Topic Suggestions
The following are some sample ideas for topic areas.
These could be used to stimulate your own thinking about
areas of interest and lead to a narrower, more specific
project. I much prefer in-depth treatment of a narrow topic
over shallow treatment of a broader problem.
I encourage you to talk to me about ideas that
you are thinking about early on in order to help you
focus your efforts. Finally, there are digital still and video
cameras available
if you need to digitize some images for your project.
Note: I will add more ideas to the top of the following list as additional
ideas come to me.
- Multiperspective Image Representations for Object Recognition
Encode appearance (e.g., image patches extracted using an interest operator)
and geometry (e.g., the silhouette contours) all around an object (e.g., a person's head or a vase)
and then use this representation to recognize an object in an input image or input video.
- Image Synthesis of Noun Phrases
Given a simple noun phrase such as "Five big red balloons" synthesize an image that
depicts that phrase.
- Visualizing Proper Nouns
Given a proper noun, use an existing search engine to collect a set of images and then select/compose/modify
one or more of them to create an output image.
- Depth from Airlight
The fact that distant mountains look bluish and the further away they are, the bluer
and brighter they appear is called "airlight," which is a type of "aerial perspective"
that painters such as Leonardo da Vinci have known about since the Renaissance.
Implement a method for recovering relative depth of outdoor scenes using this property.
This and other interesting effects are described in the wonderful books
(1) M.G.J. Minnaert, Light and Color in the Outdoors, and
(2) D.K. Lynch and W. Livingston, Color and Light in Nature.
- Photo Tourism
Implement a version of the
Photo Tourism project at the University of Washington and Microsoft.
- Content Based Image Synthesis
Implement a version of the paper "Content Based Image Synthesis by N. Diakopoulos et al.
and on the project web page.
- Automatically find good prototypical images for words
Given a word, image search engines return a ranked list of images that have that word in a caption or
as part of the file name, but the top images are frequently not very typical of the word.
Devise a way to rerank these image results so that the top 1-3 images are good prototypes that
might be used as part of a picture dictionary. For example, compute a set of image patch features, e.g.,
using the SIFT operator, and then cluster the top 100 images returned by the search engine to
find those that are most similar to one another and, perhaps, are also relatively close of other,
related words or concepts. Word similarity could be determined using
WordNet.
- Automatically select local features for object recognition
SIFT and Harris features are good, distinctive features, but there they are not selected
based on how well they may describe an object in order to perform object recognition accurately.
Adapt the ideas used in defining keypoint and corner detectors so as to produce a good
set of features for recognition. Or use data
produced by people who've played the
Peekaboom game.
- Use manually-labeled datasets of images for a novel task
Recently a number of online games have been created that ask users to manually label images or segment
images in order to collect a lot of training data for use by vision algorithms. Devise an
interesting task that can make use of one of these sources of data. Some of the games and their
datasets are:
- Family face recognition
Given a set of training images of the faces of about 10 known people (e.g., your family),
recognize those individuals in testing images and also distinguish them from other
unknown people, who you don't care who they are.
- Video epitomes
Implement a version of this new idea and use it for a task such as those suggested at
the author's web site.
- Object category recognition by unsupervised learning
Implement, extend, and test a probabilistic method for describing and recognizing object
categories, e.g., one of the methods by Rob Fergus, as described at his
web site.
- Where am I?
See the problem description and get data at
http://research.microsoft.com/iccv2005/Contest/
- Dynamic object recognition
Develop an object recognition algorithm that takes an input video of a rigid object
that is moving in front of the camera and uses the dynamic information to recognize
it as one of a library of known objects, that are given in a set of training videos.
- Detection of dilated bronchi in high-resolution CT images
- Evaluation of skeletonization methods for describing and matching brain white matter images
- An online "game" for obtaining manual data from users on some aspect of image analysis
- 3D Scene Reconstruction of Historic Structures from Paintings
Given a set of images of paintings of structures that no longer exist, implement
and use Tomasi and Kanade's factorization algorithm to do 3D scene reconstruction
of historic buildings.
- Robust Estimation for Correspondence in Wide-Baseline Stereo
In wide-baseline stereo it is very difficult to determine enough good
correspondences to use standard robust estimation methods to estimate
a good fundamental matrix. Investigate ways of enabling a robust method
could be used in this problem domain.
- Segmentation of Brain Cortical Surface
Given a triangular mesh representation of the inner and outer cortical
surfaces of a brain, segment the surface into clinically-meaningful "lumps."
See work by
M. Chung, for example.
- Change Detection in Brain Surface
Given two triangular mesh representations of a brain's cortical surface
at two different times for a single person, compute a vector field that
describes how points on the first surface have plausibly moved to the
second surface.
See work by
M. Chung, for example.
- Shock Graphs and Divergence-Based Skeletons
Implement a more complex skeleton detection algorithm, such as
Kimia's shock graph
or
Siddiqi's
divergence-based skeletons, and use
it for 2D or 3D skeletonization and possibly for object recognition.
- Reconstruction of 3D Textures
Image-based reconstruction of 3D textures such as cloth, hair,
or trees from a set of images. Perhaps using a Facade-like approach
(see work by P. Debevec).
- Texture Synthesis
Implement one of a number of recent methods for synthesizing new images
of a texture from an existing texture image. See
Chapter 11.3 of the textbook and
A. Efros's web page
for links to a number of recent methods.
- Video Textures
Implement this
recent method
or a new variant of it.
-
Motion-Based Segmentation
Segmentation of video. See, for example, the method by
Kuhne et al.
-
Level Set / Fast Marching Algorithm
Implement one of these algorithms and apply to a computer vision problem.
See Sethian's web page
for more information.
-
Correspondence from Widely Separated Views
Write an algorithm for determining corresponding points in a pair
of images taken from cameras that were far apart and unknown
relation between them (i.e., uncalibrated cameras). Implement
or use as a starting point the algorithms given in the paper
"Wide Baseline Stereo Matching,"
Proc. 6th Int. Conf. Computer Vision, 1998, 754-760.
(See also
"Matching and Reconstruction from Widely Separated Views" for a successor paper.)
Also see "Wide baseline point matching using affine invariants computed
from intensity profiles," Proc. 6th European Conf. Computer Vision, Part I,
2000, 814-828.
-
Reflectance Modeling from a Sparse Set of Views
Given a set of calibrated input views of a scene with unknown reflectance for each
of the visible scene surfaces, estimate BRDF reflectance models for each
distinct surface. See, for example, the work on
inverse global illumination
by Y. Yu.
-
Reconstruction from Silhouettes of Circular Motions
Implement the method described in the paper
"Reconstruction
and Motion Estimation from Apparent Contours under Circular Motion". See also a
demo
of this method.
-
Recognition of Activities using Temporal Templates
Implement the technique for recognizing activities described in
the paper
by J. Davis and A. Bobick. See also
online demos.
-
Video Analysis
Devise some problems where evaluating long image sequences is required to
detect certain events. For example, if a given subsequence is known to be
of interest (e.g., it shows a prototypical activity of some kind), develop
a method for matching this subsequence with segments of the long input
sequence to find positions where the subsequence has a good match.
-
View Morphing Dynamic Scenes
Develop an extension of view morphing to handle scenes containing
moving objects. For example, a change in viewpoint of the entire
scene plus changes in pose of a single moving object within the scene.
Assume the moving object can be segmented and optical flow information
is also extracted.
-
Real-Time View Morphing
Develop an automated version of view morphing and implement on
the SGI machine to achieve real-time (i.e., 10-30 frames per second)
performance.
-
View Morphing Heads
Develop methods for adapting view morphing to apply to
two images of a person's head (say taken by two or
three cameras mounted around your workstation) so as to render
new views of that person.
Could be used in applications such as videoconferencing.
You can build on our existing view morphing code.
Main extension would be to develop an automatic line correspondence
method for the two views of a face, based on a simple model of
the head and face.
-
View Morphing from Multiple Views
Extend view morphing to take either three views as input, or else
a sequence of views from a linearly translating camera, and combine
them for synthesizing new views. You can build on our existing
view morphing code.
- Develop View Morphing or Other View Synthesis Method for Interactive
3D Viewing over the Internet
That is, define an application of view synthesis and explore
implementation and other specific issues in order to make
real-time interactive viewing/manipulation feasible for that
application.
-
Voxel Coloring using Variable Resolution Voxels
Develop a version of the Seitz and Dyer Voxel Coloring algorithm
(as presented in class, and there is a paper available) that
uses a variable resolution representation of the scene voxels so
as to more efficiently reconstruct and color the voxels in the scene.
- Voxel Coloring from Uncalibrated Cameras
Rather than coloring voxels in Euclidean space, color voxels
in another representation that avoids need for fully calibrated
cameras.
- Devise a Novel Camera Setup
that includes one or more cameras, perhaps movable on a simple
rig, that would enable the rapid capture and processing of images
for view synthesis type applications. In other words propose a
hardware configuration for use in view synthesis or other domains.
An analog to this kind of thing is the
OmniCamera
at Columbia University.
-
Mosaicking Extensions
Add extensions for mosaicking images. For example, handling non-static
scenes by including a residual term as shown in the video in class.
See the paper by Irani and Peleg
(cited in the Supplementary Reading web page) for ideas.
There are many references to papers on this
subject in the Supplementary Reading web page.
- Registration for Image Mosaics
Use an optimization solver to completely solve or to refine
an initial manual registration for an image mosaic.
- Solve the "Hole" Problem
in the Chen and Williams view
interpolation paper.
-
Snakes for Image Databases
Investigate ways of (efficiently) searching a database of images
for objects that are similar to a user-specified contour, which acts
as the initial contour for a snake-based measure of similarity
searching in the database. Precompile information for each image in
the database that will enable an initial snake to be compared with that
image.
-
Snakes for Skeletonization
If you enjoyed the homeworks on skeletons and snakes, try combining
them! Implement the skeletons-from-snakes algorithm as described in
the paper by F. Leymarie and M. Levine, "Simulating the
Grassfire Transform using an Active Contour Model,"
IEEE Trans. Pattern Analysis and Machine Intelligence 14,
1992, 56-75.
-
Intelligent Scissors
Implement the interactive, snakes-based method for image segmentation
described in "Intelligent Scissors for Image Composition,"
E. Mortensen and W. Barrett, Proc. SIGGRAPH 95, 1995, 191-198.
- Use the Reflectance and Texture Image Database
of Real-World Surfaces
This database contains
many images of real textured surfaces under controlled illumination
and viewing conditions. Devise experiments that utilize this data
for view synthesis, texture analysis, shape from shading, etc.
-
Layers
Implement the layers representation of video described originally
in "Representing Moving Images with Layers," J. Wang and
E. Adelson, IEEE Trans. Image Processing 3, 1994, 625-638.
(See also: "Layered Representations for Vision and Video," E. Adelson,
Proc. IEEE Workshop on Representation of Visual Scenes, 1995, 3-9.)
-
Straight Line Detection
Implement the algorithm for straight line detection by Burns,
Hanson, and Riseman, as described in the paper "Extracting
straight lines,"
IEEE Trans. Pattern Analysis and Machine Intelligence
8, No. 4, 1986, 425-455.
-
Web Image Search Engine
Define an analog to the text-based web search engines that applies
to images. It may be best to restrict this to specific types of
images and then develop the search engine around a restricted
set of operations that the user can ask about these images.
Examples of such search engines include
WebSeer (University of
Chicago) and
WebSEEk (Columbia University). Also, see the
Scientific American article on this subject.
-
Vision for Human-Computer Interfaces
See the paper by Quek cited in the Supplementary Reading web page
for ideas related to HCI and vision, and then develop your own
ideas or implement a simple prototype of some kind.
-
Looking at People
See papers on human gesture and face recognition cited in the
Supplementary Reading web page and develop your own application,
or implement a technique related to one of those papers.
See also the list of
online papers on face recognition.
-
Non-Model-Based Feature Tracking
Implement a technique capable of tracking some type of image feature over
several frames of an image sequence. Possible types of image features
include point-features like corners (e.g., Tomasi, Moravec), contours
(e.g., Kass, Blake), a region or group of pixels (e.g., Rehg & Witkin),
or optical-flow data (e.g., Allmen).
-
Tracking Contours across "Event" Boundaries such as when
contours split or merge.
-
Line Segment Tracking using Deriche and Faugeras's
extended Kalman filter method.
Alternatively, some other model-based tracking algorithm could
be implemented.
-
Vision Databases
Efficiently processing queries involving geometric/spatial data, perhaps
represented at multiple scales. Query-by-shape. Query-by-x.
Consider especially
issues related to queries and data that represent 3D objects,
not simply 2D images or 2D features.
-
Vision and Learning
Learning object models of non-rigid objects, for example.
Learning image-based representations.
Active learning of shape or x.
-
Vision and Videoconferencing
Recovering purposive information about participants; maintaining
models of faces, hands, etc. associated with the flow of discourse.
-
Recognizing Melanoma
Develop methods for representing critical features for diagnosing
melanoma. Images in /p/vision/images/melanoma show the
types of images that must be analyzed.
-
Active Vision
New ideas for recovering some kind of scene information
from an active viewer (you specify what the "active"
viewer parameters are and what kind of task-specific information
needs to be recovered).
-
Visual Exploration
Methods for using vision to guide a mobile robot
in "exploring" an unknown 3D surface. That is,
vision-guided navigation problems.
-
Depth from Focus
Reading and analysis of methods for estimating
depth from measurements of the relative blur.
-
Focus-of-Attention Mechanisms
That is, how to control a sequence of processing steps
which each operate on a selected subset of the image data.
This could be associated with a specific task for homing,
tracking or altering which could be made more specific
so as to characterize the criteria for adaptive, data-dependent
sequencing over the data (possibly in space, resolution,
and/or time).