Experimental results

The first dataset consists of 13 images that contain a human face, and 13 images that do not. On this dataset the classifier correctly classified approximately 95.8% of the positive examples and 75% of the negative examples. Adjusting the odds ratio classification threshold did not have much of an effect on the results - images tended to get odds ratio scores that were either very high or very low. For this data set, the incorrectly classified negative images were somewhat, but not totally, consistent across experiments. As one would expect, the negative images that were misclassified tended to contain coherent flesh-toned objects, such as a beige wildcat or a sandstone house.

The second dataset consisted of 13 images of scenes on the beach, and 13 non-beach images. On this dataset the classifier correctly classified about 53% of the positive images and 86.7% of the negative images. Again, the threshold did not seem to matter much. The poor performance on positive example images was somewhat surprising. I had selected beaches because I thought the borders between sky and sea and sand would provide excellent discriminative patches. From examining the misclassified positive example images, it seemed that the discriminative patches selected from the training collage epitome were very ``picky'' about the exact color of the sand, sea, and sky. Beach images under darker illumination were misclassified often, even if there were dark beach scenes in the training set. This seemed to be because the dark training images contributed only sky patches to the discriminative patch set instead of sand patches. The training image clustering modification suggested below in the ``Possible Improvements'' section may be a potential answer to this problem. On the other hand, the classifier did a very good job at deciding what was not a beach. This may be due to the same ``pickiness'' over sand, sea, and sky color that caused it to perform poorly at deciding what was a beach.

David Andrzejewski 2005-12-19