Computer Vision

Nowadays, Panorama is quite a great trend to be used when we're taking photos. Having this technique, we might think, for example, why do we need to move here and there to record a game in order to record vedios in a wide angle. Why not using video panorama instead? In fact, with more and more sports lovers uploading their parachute, free diving, or just taking roller coaster's vedios online, more and more people are looking for a better way, like taking panorama vedios to record their life.

However, the tools for making video panorama is seldom seen and the devices used for this purpose are still quite expensive. So, we hope that we can find a way to do this and can put it into daily use.

To achieve video panorama, we divide the whole process into 4 parts

At the beginning of the steps, since we may need to take a 360 degree video, we have to do the cylinder projection to the images. To do so, we projected the 2D pixel matrix value to a 3D cylindrical matrix. We also added distortion correction in this step, as we found with image number larger than 3 the distortion may cause great problem in image stitching and the final cutting. We modified the projection program by adding the focal length value with corresponding k1 and k2 value. By tuning k1 and k2 value we deliminated the distortion problem.

In the meantime of stitching, we add another feature to make our photo better. In the real life, since we want to take all surrounding around us, the light condition may vary. Here, we provide gamma correction to solve the problem. Gamma is used to describe the relationship between RGB Intensity values between an input image and the resulting output image. By sampling enough amounts of points of blending area of the stitched images, we adjusted the gamma value of the current image with the previous image, to achieve exposure utility.

We do the same stitching steps as in our homework. Given by two images and the H matrix, we calculate the new postion and use blending method to put two images together. After getting the new image, we crop the boundary to make the image rectangular.

For the blending method here, we use alpha blending. Alpha is the transparency channel value encoded in each image. The main algorithm of alpha bledning is the blended image value = (1-alpha) * img1 + alpha * img2 value. To determine a good alpha value, besides calculating every single point's distance to the boder, to mask boder value to be 0 and center value to be 1, we also added more sampling iterations around the blending area, to get a more smooth transition on image blending without ghost image. We also tried pyramid blending techinique, while the result is not good as our alpha blending.

To do the video stitching, we do need to synchronize videos, that is, to find the frames of the videos happen at the same time. We can just assume there are two videos because if we have several, we still can stitch them two by two (One is the base video). It is intuitive that, there must be one video starts earlier. So that we can use the first few continuous images of one video to search the first frame (We assume two videos start within one second) of the other one and do the same to the other one. Since by running RANSAC, we can have the number of matching points, we can use the maximal number of matches as the synchronized frame for future stitching.

Since at this step, we assume cameras are fix, we can use only one H matrix get from the first step. This will accelerate the whole process.

In this step, we should first assume that the shaking is small, which means the boundary will only shift in a small range. Now we can do the image stitching as normal. Besides, we need to compute the H matrix that contains only translation part between the current frame and the last frame. This H matrix is special here because the shacking should only cause planar shift. We use the H shift matrix to move the second frame to the right position and then cut the image with a smaller boundary determind by the imaginary shacking range.

Set parameters and run:

run.mWrap:

runWrap.mRANSAC:

runRANSAC.mCylinder Projection with distortion correction:

cylProj.mExposure Correction:

matchExposures.mBlending Pre-processing:

beforeblending.mAlpha Blending:

mergeAlpha.mApply H matrix to get the destination points:

applyHomography.mAlign two images to get H matrix by running RANSAC:

alignImg.mCount the number of matches

getMatches.mCompute H matrix y two images

computeHomography.mImage Stitching:

stitchImg.mVideo Stitching:

stitchVideo.mGenerate SIFT Matches:

genSIFTMatches.mSIFT Libaray:

sift_lib.zipSince the cylinder projection and distortion correction effect is not so clear when we have only two images to be stitched in our video generation. We've found the larger the change of the vedio taking angle is, the larger distortion problem we may have in cylindrical projection. We applied it on 3 images we've taken on Bascom Hall instead to show the result. Here's the result.

Befor distortion Correction:

After distortion Correction:

Before Exposure Correction and No blending:

After Exposure Correction and Homework blending:

After Alpha blending:

Video after the whole process

We didn't find public work in this area, so we decide the process by our own.

The first problem we meet is how to find the synchronized frames. Since we can learn the fps from the property of videos, once we have the first synchronized frame, we can get the further ones. In our design, we use the first few frames of both videos to match the synchronized frame. It could be better if we use more starting frames or more continuous frames to match. But if doing so, running time will be a very big problem in this case, since RANSAC is not very quick to run.

And the first frame said above is not exactly the first frame of the video but the first frame of the output. This is because we need moving objects inside the scene to determine whether two scenes happen at the same time. Otherwise, the matching will just depend on the random result from RANSAC.

In our case, we assume there are several people holding their cameras taking video simultaneously so that they won't move while taking. Since there may be asynchronous shakes when we taking the videos by several hand-hold cameras, we need to compute H matrix for every frame rather than just one in fixed mode. Besides, the images in the video should be in the same size. So after the image stitching of frame t, we may find it moves a little bit from frame t-1. We move it back to the position and cut with a small frame.

However, there are still some limitations. One important thing is, since in the panorama process, we assume we take the images from one single point. But in our use case, we need several cameras taking videos together, which is hard to have the cameras set in a single point. This will cause some sight angle mismatch and cause problems when we do image stitching for scenes that are not far away from us.

Design whole algorithm. Combine codes. Write image and video stitching (fix part).

Tunning parameters. Collect experimental materials. Write video stitching (shaking part).