Performance

As the background plate finding can be done offline, the amount of background processing overhead is nearly constant, whereas the amount of foreground processing is directly proportional to the number of frames. The foreground processing performance is examined to identify bottlenecks.   In our experiments, each frame takes about 2.5 seconds on a P3-500 running NT. The breakdown is as follows:

Initial background and foreground separation:..1.62 seconds.

Main character isolation:………………………0.89 second.

Gesture matching:……………………………...0.02 second.

From the data, one sees that improvement in the initial separation stage may lead to substantial overall performance improvement.

 

Future Work

1.      Some user interface would be helpful, especially for more fine-toned matching.  We envision some interactive graph for users to emphasize matching on certain body parts.  The graph is then translated to weights on SD’s for matching.

Figure 1 user specified curve indicating emphasis on the arm motion

2.      The main constraint of the top curve approach is that it detects coarse level gestures, hence it is only suited for silhouette poses.  Given some skin tone information, we may be able to detect more detailed poses.  But the initial skin tone experiment failed due to the small area of skin pixels in our environment setting.  The top curve idea again fails for characters carrying objects (unless it was depicted in the template as well).  In short, a more scalable approach is in demand.

3.      Extend the detector to operate on multiple main characters.

4.      Periodic background plate finding to adjust for background illumination changes.

 

Conclusion

In this project, we deal with the problem of gesture recognition of a relatively small moving object of interest in a highly noisy but constrained environment.  We take as input a video taken by a static camera in a constant lighting environment and a black-and-white upper body sketch of the desired gesture.   We isolate the main character in each frame and output a collection of identified occurrences of the template gesture.  We have achieved some success utilizing both logical reasoning and hacks.  There is still ample room for exploration and generalization.

 

Reference

[Haritaoglu] Ismail Haritaoglu, “W4: Real-Time surveillance of People and Their Activities,” Proc. IEEE Transactions On Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp.809, August 2000.

[Vision] Ramesh Jain, Rangachar Kasturi, Brian G. Schunck. “Machine Vision”, Chapter 2, pp60.

 

[ 1 | 2 | 3 | 4 ]