OCR Preprocessing

Optical Character Recognition Preprocessing

Seth Foss, Peggy Pan

1. Introduction

We came up with the idea after finding an OCR program written in Matlab code. It takes in an image file with text, attempts to recognize what is written in the image, and the outputs the result into a text file. While we determined that the program works reasonably well with some limitations, it seems like that it is not invariant to orientation. It seems that the OCR generally fails when the angle of difference is greater than around 7.5 degrees. Because of that, we decided to add preprocessing capabilities to the program so that it automatically detects whether the image needs to be rotated, and if so, actually proceed to rotate it.

The Matlab code for the OCR function can be downloaded here.

2. Our Approach

We first take the input image and search for pixels along the top and bottom edges of the text. Comparing the pixels with each other, we calculate the mode slope (slope that occurred the most often within a list), and use that to determine the angle of rotation. After rotating the image once, we do the whole process over again with the new image to maximize the accuracy of the angle that is predicted.

Next step would be to rotate the image 3 times in one direction, in increments of 90 degrees. Just by having the text is correctly oriented does not guarantee that it is right-side-up, so we will just give four different outputs, and let users pick which one is the correct one based on the results returned by the OCR.

With a few examples that we tried, the results were pretty satisfactory. Of course, many factors, like inconsistent indentations or the frequent usage of certain letters over others, could throw off the result, but we believe that there shouldn't be many problems.

3. Examples and Results

Below shows an example of using the OCR, and how it fails with an inproperly oriented image.

Running the OCR, with a input on the left and its output on the right

A failure case

Next four sets of images are the end results rotated 0, 90, 180, and 270 degrees, respectively. Note that it just so happened that the image at 0 degrees is right-side-up in this case, but the actual correct output could be any of the 4, depending on the orientation of the original input image.

0 degrees

90 degrees

180 degrees

270 degrees

4. Limitations

Due to the time constraint, we were not able to modify the code within the OCR itself. The performance capabilities of the character recognition wholly depends on the OCR implementation. But because of that, we are also allowed to make assumptions for our image inputs.

The OCR only accepts capital letters and numbers.
The characters have to be black, and the background has to be white. The OCR does not recognize anything other than standard font uppercase letters and numbers. Because of that, we can assume that the input contains strictly what we want to read in.
The input text is at least a few characters long, and preferably contains at least a few words.