Seth Foss, Peggy Pan
1. Introduction
We came up with the idea after finding an OCR program written in Matlab code. It takes in an image file with text, attempts to recognize what is written in the image, and the outputs the result into a text file. While we determined that the program works reasonably well with some limitations, it seems like that it is not invariant to orientation. It seems that the OCR generally fails when the angle of difference is greater than around 7.5 degrees. Because of that, we decided to add preprocessing capabilities to the program so that it automatically detects whether the image needs to be rotated, and if so, actually proceed to rotate it.
The Matlab code for the OCR function can be downloaded here.
2. Our Approach
We first take the input image and search for pixels along the top and bottom edges of the text. Comparing the pixels with each other, we calculate the mode slope (slope that occurred the most often within a list), and use that to determine the angle of rotation. After rotating the image once, we do the whole process over again with the new image to maximize the accuracy of the angle that is predicted.
Next step would be to rotate the image 3 times in one direction, in increments of 90 degrees. Just by having the text is correctly oriented does not guarantee that it is right-side-up, so we will just give four different outputs, and let users pick which one is the correct one based on the results returned by the OCR.
With a few examples that we tried, the results were pretty satisfactory. Of course, many factors, like inconsistent indentations or the frequent usage of certain letters over others, could throw off the result, but we believe that there shouldn't be many problems.
3. Examples and Results
Below shows an example of using the OCR, and how it fails with an inproperly oriented image.
Running the OCR, with a input on the left and its output on the right
A failure case
Next four sets of images are the end results rotated 0, 90, 180, and 270 degrees, respectively. Note that it just so happened that the image at 0 degrees is right-side-up in this case, but the actual correct output could be any of the 4, depending on the orientation of the original input image.
0 degrees
90 degrees
180 degrees
270 degrees
Due to the time constraint, we were not able to modify the code within the OCR itself. The performance capabilities of the character recognition wholly depends on the OCR implementation. But because of that, we are also allowed to make assumptions for our image inputs.