CS 766 Computer Vision Project 2

Generating Panoramic Images

James Hill
Ozcan Ilikhan

Project Description

In this project we implemented several programs that, when run together, can produce a panoramic image. A Panoramic image is one that is "stitched" together from several smaller images to produce a wider field of view. With this technique, a field of view of 360 degrees can be achieved.

Building a panoramic images requires several steps. First, a sequence of images must be aquired. This sequence must be taken such that the contents of the right side of one image is the same as the contents of the left side of another. Provided that all images have this characteristic, then there exists a sequence such that if the images are "stiched" together, the resulting image will be a continuous image of the entire field-of-view of each individual image.

The second step is to warp all images by projecting them onto a cylinder who's radius is the focal length of the camera. Determining the focal length of the camera requires that a calibration of the camera be carried out. Luckily this information was provided to us in the project instructions. The result of this step is a set of images each of which is a warped version of one of the original images.

The third step is to perform feature detection in the images. We need to find rotationally and translationally invariant features in each pair of images so that when we attempt to stitch the images together, we know how to align each of the images. The result of the feature detection step is a set of points in each image which represent the coordintes of a feature of the image along with a vector of numbers representing the description of the feature.

The fourth step is to inspect each pair of images where each pair will be stiched together. The goal is to find features in the left image that match the features in the right image. In the best case, each feature in the left image would have exactly one, perfect matching feature in the second image. In reality this is not the case. We find the best match for a feature in the left image by comparing the descriptors of that feature with all feature descriptors in the right image. Using a metric, we find the top two matches. We then divide the metric for the best match by the metric for the second match and compare it to a predetermined threshold value. If the check passes, we consider the pair a match, otherwise we assume that there is no matching feature in the right image for the feature in question. The output of this step is a set of pairs of feature coordinates which represents points in the left image that correspond to points in the second image.

The Fifth step is to calculate a translational transformation on the image that will result in the best possible stitch of the two images. In this case we would like the transformation that will result in the least distance between the feature point pairs found in the previous step.

The Sixth step is to actually stitch the images together. This is as simple as overlaping the images using the translational transformation found in the previous step and then blending the overlapping section using a feathering filter.

The final step is to remove drift in the images by performing another transformation such that the top of the final stiched images alignes with the top of the original image. This will only work if the original image appears on both sides of the panorama.

The result is a very interesting view in which the entire field of view of a set of images is present in one very wide image.

What did we learn

From this project we learned the following

How to compute the projection of an image onto a cylinder
How to perform feature detection (althrough we did not implement it)
How to determin a best match for a feature in one image given a number of features in a second image
How to determin a transformation that will allow us to stitch two images together
How to warp an image for alignment purposes

Description of Algorithms

The first step in this process requies the project of an image onto a cylindar. This projection will work best if the radius of the cylindar is equal to the focal length of the camera the picture was taken with. By projecting a ray from the center of the cylindar to each pixel in the image, the projection can be found by determin where that ray intersects the cylinder. Our algorithm follows that found in the notes.

Once the images are in cylindrical coordinates, we must build a list of features for each image. This is done using the siftWin32 application available for free on the internet from: http://www.cs.ubc.ca/~lowe/keypoints/.

Given a set of features for each image in the set, we can use the descriptors of each feature to attempt to find correpsonding pixels in each pair of images. We do this by first computing the SSD distance between pairs of features point descriptors. For each feature point in the left image, we compute the SSD distance between it's descriptors and each feature point descriptor set in the right image. While computing this, we keep track of the two smallest SSD's. Once we've found them, we compute the ratio of the smallest to the second smallest. If this value falls below a predetermined threshold, we consider the two points to be corespond.

Once we have sets of corresponding features for each pair of images, we can then compute the transformation required to minimize the distance between each pair of corresponding features. We do this using the RANSAC method in which we compute the translational transformation for each pair of corresponding features. We then apply this transformation to each feature point in the left image and compute the SSD distance from the calculated value to the exact value. If this distance is below a predefined threshold, we consider it an inlier and increment the inlier count for that transformation. Once all the transformations are checked, the one with the most inliers is considered the best transformation for stitching the images.

Knowing the translational transformation, we can now stitch the images together. To do this, we simply copy the first image into the output image buffer and then copy the second image into the output image buffer after applying the translformation found in the previous step. This results in some pixels that overlap. For those that do not overlap, they are simply copied into the final image. For those that do overlap, the final pixel value is computed using a linear blending function along the x axis.

After combining all the images, there is usually some form of drifting that occures meaning that the image is not perfectly horizontal. This can be corrected if the final image in the sequence is the same as the first. By warping the image such that the upper edge of the first image is aligned with the upper edge of the last image, a horizontal image is generated. This is done by shifing each column of the image up by a certain amount based on the horizontal position. This may require sampling the original image if the offset doesn't fall squarly on a pixel.

Implementation Details

We implemented this project as a two pass solution consisting of three programs, one of which is optional. Both programs where writen in C++ using Microsoft Visual Studio and have only been compiled and tested on Windows XP 32-bit machines. However, no part of the code is platform dependent and all libraries used are platform independent so there should be little difficulty in porting these applications to another platform.

The first program that we implemented performs the cylindrical projection of each image. It saves two versions of the resulting image. The first is a color version in the same format as the original with "_cyl" appended to the end of the image name. The second version is a grey scale pgm image which can be fed into siftWin32 for feature point generation.

The second program that we implemented takes the warped images and the feature point files generate by siftWin32 and generates the output panorama.

The third program the we implemented is optional and is detailed in the "Possible Extra Credit" section at the end of this document.

Two major libraries where used for this project. The DevIL libraries where used for low level image IO. This provided us the ability to load and store many different image formats. Intermal to the program, images were saved in a class specifically designed to make accessing the image as simple as possible.

The second library was used for the optional part. This was the Qt windowing library. It was used to provide basic GUI components.

Usage

The basic workflow for generating a panorama is as follows.

Generate warped images using CylProj program
Generate feature points by running each .pgm program generated from the first step through siftWin32
Optionally look at pairs of images using the FeatureViewer application
Generate the panorama using the GenPan program

All binaries provided in the handin director are release builds. When testing, we found that DevIL.dll and ILU.dll where required to run the two generator programs and the FeatureViewer. Qt libraries should not be required to run the FeatureViewer.

All three programs are meant to be run from the command line. All three program accept the "-h" option and will print a description of the program along with it's usage and a list describing each of the command line arguments. Each program can take arguments on the command line or via an argument file in which each line contains one argument value pair of the form

-argument=value

The file can then be passed to the application as follows:

AppName -arg_file=[file name]

Example argument files for each program can be found in the example directory.

Results

[TODO] : show results here

Libraries Used

DevIL image library	http://openil.sourceforge.net
Qt Windowing ligrary	http://qt.nokia.com/products

Possible Extra Credit

We built a feature viewer that allows us to do three things:

View the locations of feature points generated using siftWin32
View the locations of matching feature points found using our matching algorithm
View the corresponding points between two images using the RANSAC algorithm

The application takes several parameters which can be viewed by passing the "-h" option to the application. It requires two images and their associated key files and allows for the specifications of matching thresholds and a ransac epsilon value. It displays the two images side by side with non matched feature points as red dots, matched feature points as blue dots, and blue lines between matched feature points that were discovered in the RANSAC process. From this we can visually inspect the results of using different matching thresholds and RANSAC epsilon values.