Image epitomes

Image epitomes were originally proposed in [3]. The epitome of an image consists of a smaller bitmap and a set of mappings from patches in the bitmap to patches in the original image. In this way, the original image can be reconstructed from the epitome, although not perfectly. The epitome model is useful for multiple vision applications, including segmentation and denoising.

Each pixel in the epitome is modeled as a mixture of gaussians, one for each color channel. For each patch in the original image, a posterior distribution over all patches in the epitome is calculated to represent the possible mappings. To learn the epitome, the mapping probabilities are initialized along with the epitome pixel means and variances. The EM algorithm is then applied to update these values for either a set number of iteration or until convergence.

An essential aspect of the epitome is that, given an image patch in the original image and the image epitome, one can calculate the posterior probability of the mapping from each patch in the epitome to the image patch. The epitome patch with the highest posterior probability of mapping is then used to generate that image patch in the reconstruction. A single epitome patch can, and often does, generate many different patches in the original image. This one-to-many mapping captures textual and shape regularity in the original image.

The website [4] contains more information about image epitomes, as well as example images and MATLAB code. The provided MATLAB code comes in the form of two functions

[e,ev,p,etr] = epitome(x,K,N,T,NIT,sc,e,ev)
z = reconstruct(x,K,e,ev,emod,SPC)

The epitome function does $NIT$ iterations of EM to learn an $N$ by $N$ epitome of image $x$ using $T$ patches of size $K$ by $K$. $e$ and $ev$ are optional parameters that supply the initial values for the epitome. $sc$ is another optional parameter that can be used to specify different scaling factors - the EM algorithm is repeated $NIT$ times at each scale, and the end result is used to initialize the epitome for the next scale. In order to save time, I did not use the scaling parameter in this project.

The reconstruct function uses the original epitome ($e$,$ev$) along with the original image $x$ to reconstruct the original image with a modified epitome $emod$. This is useful for tracing pixels from the epitome back into the original image. This is well illustrated in the PowerPoint available from [4]. The PowerPoint contains an image of a dog sitting on some gravel in front of flowers. The pixels corresponding to the gravel in the epitome are then colored bright teal, and this modified epitome is used to reconstruct the original image, resulting in a reconstructed image where the gravel area is bright teal.

One important aspect of the reconstruction algorithm is that the mapping from the epitome to each patch in the original image is "winner take all". That is, the epitome patch with the highest posterior probability of mapping into an image patch is the only one used to generate that patch in the original image. Patches in the original image are allowed overlap, and in this case their contributions to a given pixel are simply averaged.

My project uses these two functions to learn epitomes and to reconstruct images. I also use code fragments from these methods for some epitome-related tasks, like determining which epitome patch has the maximum posterior probability of mapping into a given image patch.

The image epitome concept was extended to video data in [1]. The extension is a natural one and involves modeling video data as a 3-dimensional cube where individual frames from the video are stacked on top of one another along the temporal axis. The video epitome then contains 3-dimensional video cubes instead of 2-dimensional image patches. One of the authors has video examples and code available on the web at [5]. While my project does not explicitly use video epitomes, these resources were useful in gaining a better understanding of the epitome model in general.

David Andrzejewski 2005-12-19