Algorithm Analysis (big O) and Searching

The Problem

Technique 1: I can't keep track of numbers in my head too well, so I have to write them down. So what I decided to do was for each step I made, I would make a mark on the chalk board. So, I stood with my back against a wall, took a step, took a step back, and made a mark on the board. Then, I took two steps forward, turned around, took two steps back and made another mark. I kept up this process until I had reached the far wall. It turns out I could take 15 steps.

Technique 2: I then came up with a slightly better version of the previous technique. What if instead of making the marks myself, I had somebody make a mark. That is, each step I made, I would have someone else make a mark on the board for me. This would save me a lot of walking of time.

Technique 3: Of course, there is a another way to cut down on the amount of walking I had to do: I could just ask my psychic friend (Stevie) how many steps I could take. He thought for a second and told me there were 15.

The Problems

So what were the things that had to do be done for each technique? Well, I can think of two off hand:

Taking a step
Making a mark

Technique 1: Counting the marks is easy: I made 15. Counting the number of steps, however, is a little more difficult. Let's take a page from the book of Gauss.

Let's start by counting the total number of steps I had to take forward:

1 + 2 + 3 + . . . + 15

Therefore, the total number of steps I had to take (including backwards) was: 15 * (15 + 1) = 240

Thus, the total number of things was: 15 + 240

Technique 2: Once again, there were a total of 15 marks made. And I took 15 steps forwards.

Technique 3: Stevie doesn't write numbers using hashmarks like we did for the previous techniques. Instead, he just wrote the number in regular decimal notation. So that was two "marks" and I didn't have to take any steps.

	Steps	Marks	Total
Technique 1	240	15	254
Technique 2	15	15	30
Technique 3	0	2	2

The "Real" World

What exactly is an operation? That is one of the problems: there is no precise definition of what an operation is. It could be a single statement. Of course, there are some complicated statements in a program. As a small example, consider the sum: 3 + 4 + 5 + 6. This can be evaluated with a single statement, but there are actually three additions that have to be made. In general, we define an operation as a "small step".

The General Case

	Steps	Marks	Total
Technique 1	n*(n+1)	n	n² + 2n
Technique 2	n	n	2n
Technique 3	0	log₁₀n	log₁₀n

	Steps	Marks	Total
Technique 1	n*(n+1)	n	O(n²)
Technique 2	n	n	O(n)
Technique 3	0	log₁₀n	O(log n)

Notice the use of the O. This whole process of keeping track of how how long an algorithm takes to complete is known as the "order of the algorithm", and is usually abbreviated with the big O. When we analyze an algorithm, we worry not about how long the program takes to run a computer, because computers may have different speeds. Instead, we worry about the order of the algorithm.

Searching

Problem: Let A be an array of n sorted elements, and k be an element we are interested in finding in that array. We wish to find at what index k appears in our array. Return -1 if k is not in A.

Approach 1: We could search the array element by element: Start at the left end and keep moving to the left until we find the element. This technique is formalized below in pseudocode:

LinearSearch(int A[], int k, int n)
    for i = 0 to n-1:
       if A[i] = k
           return i
    return -1

However, for this problem, this is not a good technique because we are not taking advantage of the fact that A is sorted. This leads us to

Approach 2: Another technique we can use is to progressively narrow our search: if we know our array is sorted we can use this fact to make our search space smaller by half each time. That is, we keep track of the range of our search space: the left-most and right-most endpoints into the array. We then check a middle element: if our key is less than that middle element, we know that if our key is in the array, it must be in the left half of the remainder. Otherwise, it must be in the right half. Again, I formalize the technique below:

BinarySearch(int A[], int k, int n)
    left = 0
    right = n-1    
    loop: until finished:
        if left == right   // See if our search space is as narrow as possible
            if A[left] == k     // Found the key
                return left
            else                // key is not in the array
                return -1
        middle = (left + right)/2
        if k == A[middle]       // check if we found the element
            return middle
        else if k < A[middle]   // element in left half?
            right = middle - 1
        else                    // element in right half?
            left = middle + 1

Analysis

(0, 7) -> (0, 2) -> (0, 0)

However, suppose we were looking for 25. Then, linear search has to go through all elements of the array before it finds 25, while binary search narrows the space like this:

(0, 7) -> (4, 7) -> (6,7) -> (7,7)

So, which case do we worry about when analyzing the time of the algorithm? The best case, the worst case, or the average case? We tend to be pessimists, so we always assume that the input gives us the worst possible running time (well, almost always: sometimes we care more about the average case).

So, what is the order of the two searching algorithms? Linear search is easy: it is O(n) because we have to check every element in the array. Binary search is a little trickier, though.

We begin the analysis by noting that we are cutting our search space in half each time. That is, the number of elements we have to search is n, then n/2, n/4, ... , n/n. So how many times did we perform this cutting in half? We need to notice a tendency. Our space size decreases like this:

n/2⁰, n/2¹, n/2², . . . , n/2ⁿ,

ⁿ

₂