Light Field Video Stabilization

Brandon M. Smith
Li Zhang
Hailin Jin
Aseem Agarwala

ICCV 2009


We describe a method for producing a smooth, stabilized video from the shaky input of a hand-held light field video camera— specifically, a small camera array. Traditional stabilization techniques dampen shake with 2D warps, and thus have limited ability to stabilize a significantly shaky camera motion through a 3D scene. Other recent stabilization techniques synthesize novel views as they would have been seen along a virtual, smooth 3D camera path, but are limited to static scenes. We show that video camera arrays enable much more powerful video stabilization, since they allow changes in viewpoint for a single time instant. Furthermore, we point out that the straightforward approach to light field video stabilization requires computing structure-from-motion, which can be brittle for typical consumer-level video of general dynamic scenes. We present a more robust approach that avoids input camera path reconstruction. Instead, we employ a spacetime optimization that directly computes a sequence of relative poses between the virtual camera and the camera array, while minimizing acceleration of salient visual features in the virtual image plane. We validate our novel method by comparing it to state-of-the-art stabilization software, such as Apple iMovie and 2d3 SteadyMove Pro, on a number of challenging scenes.



Brandon M. Smith, Li Zhang, Hailin Jin, Aseem Agarwala. Light Field Video Stabilization. IEEE International Conference on Computer Vision (ICCV), Sept 29-Oct 2, 2009. [PDF 5.3MB]

3D stereoscopic video stabilization, Demo at ICCV 2011.



This work is supported in part by Adobe System Incorporated and National Science Foundation IIS-0845916 and IIS-0916441.

Download [MP4 60.4 MB]
Smith, Zhang, Jin, Agarwala - ICCV 2009





The following four datasets contain five-view PNG image sequences. The frame rate is 25 fps, and the image size is 480 x 360. The images have been corrected to remove radial distortion. Intrinsic and extrinsic camera parameters for each of the five cameras are available here.

Clarification (added March 13, 2011): due to image quality issues in one of the corner cameras, the images provided (and those used to generate our demos) were captured using the following five cameras: topmost center, leftmost center, center, rightmost center, and bottommost center -- not the corner cameras, as indicated in Figure 1 of the paper.

Water cooler dataset, 350 frames [ZIP 421.3 MB]
Solo juggler dataset, 530 frames [ZIP 689.3 MB]
Video game dataset, 550 frames [ZIP 714.8 MB]
Crowd dataset, 260 frames [ZIP 294.4 MB]


Presentation (PowerPoint 2007)
Slides with embedded videos [ZIP (PPTX + AVI files) 206.6 MB]
Slides only [ZIP 6.7 MB, PDF 0.66 MB]
Poster [PPTX 7.7 MB, PDF 1.0 MB]
Selected results

Light field video stabilization on dynamic scenes


Juggle scene Video game scene Crowd scene
Juggling scene
Video game scene
Crowd scene
Juggling - shaky Video game - shaky Crowd - shaky
Juggling - shaky
Video game - shaky
Crowd - shaky
Juggling - stabilized Video game - stabilized Crowd - stabilized
Juggling - stabilized
Video game - stabilized
Crowd - stabilized
Each column shows one dynamic scene in our experiments. The top row shows a frame in the original videos. The middle row shows a frame in the original video overlayed with green lines representing point trajectories traced over time. The bottom row shows a frame in the stabilized video, in which the point trajectories are significantly smoother. These examples demonstrate that our method is able to handle severe camera shake for complex dynamic scenes with large depth variation and nearby moving targets. Please see the accompanying video for a clearer demonstration.