There are several problems with the patch-based classifier. First, as a completely appearance-based classifier it can be very sensitive to changes in pose, illumination, and scaling. The only way the classifier can handle these types of variations is to include training examples that capture the variation, which is an inherently inflexible workaround strategy. Second, the classifier uses patches in a "bag of words" strategy that discards any information about the spatial distribution of the patches in the training example images. This is a significant difference from the feature representation used in [2]. Finally, it is also inconvenient to add new training examples to the classifier. With the current classifier setup, in order to take a new training example into account it would be necessary to create a brand new training collage and re-learn the epitome.