Gaze-enabled Egocentric Video Summarization via Constrained Submodular Maximization

        Jia Xu*Lopamudra Mukherjee+Yin Li#Jamieson Warner*James M. Rehg#Vikas Singh*     

*University of Wisconsin-Madison     +University of Wisconsin-Whitewater     #Georgia Institute of Technology



With the proliferation of wearable cameras, the number of videos of users documenting their personal lives using such devices is rapidly increasing. Since such videos may span hours, there is an important need for mechanisms that represent the information content in a compact form (i.e., shorter videos which are more easily browsable/sharable). Motivated by these applications, this paper focuses on the problem of egocentric video summarization. Such videos are usually continuous with significant camera shake and other quality issues. Because of these reasons, there is growing consensus that direct application of standard video summarization tools to such data yields unsatisfactory performance. In this paper, we demonstrate that using gaze tracking information (such as fixation and saccade) significantly helps the summarization task. It allows meaningful comparison of different image frames and enables deriving personalized summaries (gaze provides a sense of the camera wearer's intent). We formulate a summarization model which captures common-sense properties of a good summary, and show that it can be solved as a submodular function maximization with partition matroid constraints, opening the door to a rich body of work from combinatorial optimization. We evaluate our approach on a new gaze-enabled egocentric video dataset (over 15 hours), which will be a valuable standalone resource.


  • Jia Xu, Lopamudra Mukherjee, Yin Li, Jamieson Warner, James M. Rehg, Vikas Singh. Gaze-enabled Egocentric Video Summarization via Constrained Submodular Maximization. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), June 2015. PDF, Supplement, Poster, Bibtex.

  • Video Results

    Click to see how our method works in practice. Download MP4 (29MB) here.


    We thank Jerry Zhu for introducing us to the DPP literature. This research is funded via grants NSF RI 1116584, NSF CGV 1219016, NSF Award 0916687, NSF EA 1029679, NIH BD2K award 1U54AI117924 and NIH BD2K award 1U54EB020404. We gratefully acknowledge NVIDIA Corporation for the donation of Tesla K40 GPUs used in this research.