Stereo Matching with Nonparametric Smoothness Priors in Feature Space


Brandon M. Smith
Li Zhang
Hailin Jin

CVPR 2009

Abstract

We propose a novel formulation of stereo matching that considers each pixel as a feature vector. Under this view, matching two or more images can be cast as matching point clouds in feature space. We build a nonparametric depth smoothness model in this space that correlates the image features and depth values. This model induces a sparse graph that links pixels with similar features, thereby converting each point cloud into a connected network. This network defines a neighborhood system that captures pixel grouping hierarchies without resorting to image segmentation. We formulate global stereo matching over this neighborhood system and use graph cuts to match pixels between two or more such networks. We show that our stereo formulation is able to recover surfaces with different orders of smoothness, such as those with high-curvature details and sharp discontinuities. Furthermore, compared to other single-frame stereo methods, our method produces more temporally stable results from videos of dynamic scenes, even when applied to each frame independently.

 

Paper

Brandon M. Smith, Li Zhang, Hailin Jin. Stereo Matching with Nonparametric Smoothness Priors in Feature Space. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 2009. [PDF 3.1MB]

 

Acknowledgement

This work is supported in part by Adobe System Incorporated and National Science Foundation IIS-0845916.

 
Video
Download [MP4 60.4 MB]
Smith, Zhang, Jin - CVPR 2009

 

 

 

Source Code

The following C++ source code is available under the GNU General Public License. A makefile is included for Linux and Visual Studio 2008 files (.sln and .vcproj) are included for Windows. Please see the included readme.txt file for usage instructions.


Version 1.0 [ZIP 2.1 MB]---April 5, 2010

 

Datasets

The following two datasets contain five-view PNG image sequences. The frame rate is 25 fps, and the image size is 480 x 360. The images have been corrected to remove radial distortion. Intrinsic and extrinsic camera parameters for each of the five cameras are available here.


Plant dataset, 100 frames [ZIP 124.6 MB]
Sundeep dataset, 100 frames [ZIP 119.6 MB]

 

Presentation (PowerPoint 2007)
Slides with embedded videos [ZIP (PPTX + AVI files) 43.7 MB]
Slides only [ZIP 2.3 MB, PDF 3.1 MB]
 
Selected results
Handling Different Types of Smoothness
Left Cloth3 image [17] Our depth map Woodford et al. [31]
Left Cloth3 image
Our depth map
Smoothness types 2.01% bad pixels 6.33% bad pixels
Smoothness types
2.01% bad pixels
6.33% bad pixels
 
Different image regions correspond to 3D surfaces with different types of smoothness, as shown on the left. Such smoothness properties are often highly correlated with local image features such as intensity gradients and shading. Our nonparametric smoothness prior models the correlation between image features and depth values. Depth maps estimated using this model preserve both high-curvature surfaces and sharp discontinuities at object boundaries, as shown in the middle. Our method compares favorably to an existing state-of-the-art method that uses a fixed (2nd) order smoothness prior, shown on the right.  

 

Generating stable depth maps for videos of dynamic scenes

Reference image, frame 35 Reference image, frame 36 Reference image, frame 35 Reference image, frame 36
(a) Reference view, frame 35
(b) Reference view, frame 36
(c) Reference view, frame 35
(d) Reference view, frame 36
Standard graph cuts result, frame 35 Standard graph cuts result, frame 36 Standard graph cuts result, frame 35 Standard graph cuts result, frame 36
(e) Graph cuts result
Our result, frame 35 Our result, frame 36 Our result, frame 35 Our result, frame 36
(f) Our result
Depth results for dynamic scenes using a five-camera array. (a)-(d) Reference view input images. (e) Depth results obtained using Kolmogorov and Zabih's graph cuts method. Note the improved temporal stability of our results, especially in the highlighted regions. Please see the accompanying video for a clearer demonstration.