The Problem

    I tend to do a lot of pacing when I lecture. For this reason, I was curious about how many steps I could take at the front of the room before I ran into a wall. So I decided to take some time out to count all of these steps. I came up with three different counting techniques, which I summarize below:

    Technique 1: I can't keep track of numbers in my head too well, so I have to write them down. So what I decided to do was for each step I made, I would make a mark on the chalk board. So, I stood with my back against a wall, took a step, took a step back, and made a mark on the board. Then, I took two steps forward, turned around, took two steps back and made another mark. I kept up this process until I had reached the far wall. It turns out I could take 15 steps.

    Technique 2: I then came up with a slightly better version of the previous technique. What if instead of making the marks myself, I had somebody make a mark. That is, each step I made, I would have someone else make a mark on the board for me. This would save me a lot of walking of time.

    Technique 3: Of course, there is a another way to cut down on the amount of walking I had to do: I could just ask my psychic friend (Stevie) how many steps I could take. He thought for a second and told me there were 15.

The Problems

    Clearly, all three of these techniques takes a different amount of time to complete. But we're not interested in the time so much as the number of "things" I had to do for each technique. Why? Because the amount of time could vary from day to day: some days I might be able to do things faster than the other because I had a more complete breakfast; or a sub who can move faster than me might be the one doing the walking.

    So what were the things that had to do be done for each technique? Well, I can think of two off hand:

    1. Taking a step
    2. Making a mark
    Let's count the number of things I had to do for each technique.

    Technique 1: Counting the marks is easy: I made 15. Counting the number of steps, however, is a little more difficult. Let's take a page from the book of Gauss.

    Let's start by counting the total number of steps I had to take forward:

    1 + 2 + 3 + . . . + 15

    We recall from math that this sums to 15 * (15 + 1) / 2.

    Therefore, the total number of steps I had to take (including backwards) was: 15 * (15 + 1) = 240

    Thus, the total number of things was: 15 + 240

    Technique 2: Once again, there were a total of 15 marks made. And I took 15 steps forwards.

    Technique 3: Stevie doesn't write numbers using hashmarks like we did for the previous techniques. Instead, he just wrote the number in regular decimal notation. So that was two "marks" and I didn't have to take any steps.

     StepsMarksTotal
    Technique 124015 254
    Technique 2 1515 30
    Technique 3 0 2 2

The "Real" World

    In Computer Science, when analyzing algorithms (remember: an algorithm is an ordered sequence of statements which finishes and produces an output), we do similar things to what we did above. Of course, we prefer to use a slightly more technical term than "thing". So, instead we use "operations".

    What exactly is an operation? That is one of the problems: there is no precise definition of what an operation is. It could be a single statement. Of course, there are some complicated statements in a program. As a small example, consider the sum: 3 + 4 + 5 + 6. This can be evaluated with a single statement, but there are actually three additions that have to be made. In general, we define an operation as a "small step".

The General Case

    For algorithms, the number of operations that have to performed to complete the algorithm is dependent upon the size of the input. For the step-counting problem above, the input was the front of the room. Of course, all three of these techniques will work for other places I can walk: what happens if I want to count the number of steps I can take out in the hallway? We can perform a similar analysis which is summarized in the table below, with n denoting the number of steps in the general case:

     StepsMarksTotal
    Technique 1n*(n+1)nn2 + 2n
    Technique 2nn2n
    Technique 30log10nlog10n
    We have a very precise analysis of how long things will take. But for technique 1, we don't need to worry about the "+ 2n" part: as n gets large, this is a very small part of the final number of operations, so we tend to ignore it. In fact, we generally ignore lesser terms like that. Also, we tend to ignore constant scalars in the terms: we care more how the number of operations scales as we change the input, and these constants drop out. For similar reasons, we tend to ignore the base of any logarithms which appear, and instead just use the term "log". So, with these simpler definitions, we can summarize the running time:

     StepsMarksTotal
    Technique 1n*(n+1)nO(n2)
    Technique 2nnO(n)
    Technique 30log10nO(log n)

    Notice the use of the O. This whole process of keeping track of how how long an algorithm takes to complete is known as the "order of the algorithm", and is usually abbreviated with the big O. When we analyze an algorithm, we worry not about how long the program takes to run a computer, because computers may have different speeds. Instead, we worry about the order of the algorithm.

Searching

    Below, I describe a common problem in Computer Science, and discuss two different techniques for solving the problem.

    Problem: Let A be an array of n sorted elements, and k be an element we are interested in finding in that array. We wish to find at what index k appears in our array. Return -1 if k is not in A.

    Approach 1: We could search the array element by element: Start at the left end and keep moving to the left until we find the element. This technique is formalized below in pseudocode:

      LinearSearch(int A[], int k, int n)
          for i = 0 to n-1:
             if A[i] = k
                 return i
          return -1
      
    This technique is known as linear search, because we can think of our array as a line of items, and we are just scanning along that line looking for a solution.

    However, for this problem, this is not a good technique because we are not taking advantage of the fact that A is sorted. This leads us to

    Approach 2: Another technique we can use is to progressively narrow our search: if we know our array is sorted we can use this fact to make our search space smaller by half each time. That is, we keep track of the range of our search space: the left-most and right-most endpoints into the array. We then check a middle element: if our key is less than that middle element, we know that if our key is in the array, it must be in the left half of the remainder. Otherwise, it must be in the right half. Again, I formalize the technique below:

      BinarySearch(int A[], int k, int n)
          left = 0
          right = n-1    
          loop: until finished:
              if left == right   // See if our search space is as narrow as possible
                  if A[left] == k     // Found the key
                      return left
                  else                // key is not in the array
                      return -1
              middle = (left + right)/2
              if k == A[middle]       // check if we found the element
                  return middle
              else if k < A[middle]   // element in left half?
                  right = middle - 1
              else                    // element in right half?
                  left = middle + 1
      
    This searching technique is known as binary search because we are cutting the size of the search by 2 each time.

Analysis

    Let's begin the comparison of our two techniques by searching for elements from the following list:
    25810 15162025
    Suppose we are searching for the element 2 (that is, k = 2). Linear search works pretty well then: it finds the element right away, after only one step through the loop. Binary search, on the other hand takes a little longer. Its search goes something like (I keep track of the current left and right endpoints as an ordered pair):
    (0, 7) -> (0, 2) -> (0, 0)
    It initially appears that linear search may be a better technique: it found the solution faster.

    However, suppose we were looking for 25. Then, linear search has to go through all elements of the array before it finds 25, while binary search narrows the space like this:

    (0, 7) -> (4, 7) -> (6,7) -> (7,7)
    So this time binary search is better.

    So, which case do we worry about when analyzing the time of the algorithm? The best case, the worst case, or the average case? We tend to be pessimists, so we always assume that the input gives us the worst possible running time (well, almost always: sometimes we care more about the average case).

    So, what is the order of the two searching algorithms? Linear search is easy: it is O(n) because we have to check every element in the array. Binary search is a little trickier, though.

    We begin the analysis by noting that we are cutting our search space in half each time. That is, the number of elements we have to search is n, then n/2, n/4, ... , n/n. So how many times did we perform this cutting in half? We need to notice a tendency. Our space size decreases like this:

    n/20, n/21, n/22, . . . , n/2n,
    If there are x terms in this sequence, we are searching for the term for which: x = 2n. This is solved using the log2. That is, binary search is O(log2n).

© 2000 Michael Wade
All rights reserved