Should we track the features of images pairwise, or track them
sequentially like example4 does?
Tracking them pairwise is all that is necessary since you'll be computing
the homography between successive pairs of frames only. The only
advantage of using a version of example4 to do all frames simultaneously
is that you'll have one large file created rather than many files to
process. Whichever way you choose to do it is okay.
Is my understanding of homography correct in that we just need to
have 4 pair-point correspondences between each frame and that they may be
unique from frame to frame? In other words, let's say I track points
A, B, C, and D from frame1 to frame2, and I create the mapping between
those two. Then, maybe from frame2 to frame3 I track A, B, C and E because
D is lost or whatever. This still works right, because the projection only
depends on tracking 4 point-pairs, not necessarily tracking points across
the entire sequence.
Yes, each pair of consecutive frames will be handled independently. Thus
there is no requirement that points be tracked across more than two
consecutive frames, let alone across the entire sequence. You will then
use RANSAC to select subsets of four corresponding pairs for each
consecutive pair of frames.
Can I divide HW#3 into multiple programs so I implement it step
by step? I.e., Can I track the features first and store the features in a
file for later use? This can save me time because I won't have to
re-execute some steps again and again when debugging and running later
steps.
Yes, absolutely. I recommend you do divide things into steps to make it
as simple for you as possible.
Does the absolute value of feature in KLT have any specific meaning?
Features that were successfully tracked to the next frame will have value
equal to 0. These are the points that you're interested in. If the value
is non-zero, as explained in Chapter 2 of the online documentation of the
KLT tracker, the absolute value is equal to a "constant multiplied by the
minimum eigenvalue of the window" which means it is a measure of the
"interestingness" of the feature, with higher absolute value meaning a
better feature. The value is negative if the feature could not be tracked
in the next frame.
Since many images may overlap at a given output pixel, how should compositing
combine the (resampled) pixels from all of the contributing images?
Using Eq. (9) in the Szeliski paper means each contributing image's
pixel will be weighted based on it's distance from the center
of that image. If for a given output pixel there are m images that
overlap at that pixel, and w(i) is the weight computed for the ith image at the
contributing pixel's coordinates,
let W equal the sum of the w(i)'s, i=1, ..., m. Then the output pixel intensity value, E, is
computed by E = (w(1)/W)*E(1) + (w(2)/W)*E(2) + ... + (w(m)/W)*E(m),
where E(i) is the intensity associated with the ith image's contributing pixel.
How can I use the provided C code in my C++ program?
There are several ways to do this:
Compile Heckbert's code into a library file, say called libheckbert.a
and then link this library file with your C++ code in your Makefile.
Include the appropriate .c files from Heckbert's code in your Makefile
plus make some simple changes to the source code that modifies the declarations
(because it is invalid in C++ to declare parameters outside the method head).
For example, in pmap_poly.c change
pmap_poly(p, ST)
poly *p;
double ST[3][3];
to:
int pmap_poly( poly* p, double ST[3][3])
Add all of the .h files from Heckbert's code at the beginning
of your main() function file and also add