Sorting

Day 1, Day 2, Day 3

Introduction

    We now begin a discussion of one of the most important problems in Computer Science: sorting. Formally, we state the problem: Given a sequence of n elements a0, a1, . . ., an-1, arrange the elements such that they are in ascending order.

    We will discuss 5 algorithms that sort these elements, and compare and contrast them to see which algorithm may be best in a given situation.

Bubble Sort

Selection Sort

    When performing a bubble sort, we often have to perform a lot of swaps. Our next sorting algorithm was developed to lower the number of swaps.

    Intuitively, we keep two parts to our array to be sorted: a part on the left, which has all elements in sorted order, and a part on the right where the elements are not sorted. It is a feature of the right part of the array that the smallest element is larger than the largest element in the left. Initially, the size of the left part of the array is 0, while the right is of size n. We then scan the enture right part of the array to find (or, "select") the smallest element. We then swap this element to the beginning of the right part. We continue this process until we shrink the right-part of the array to size 0.

    The psuedocode is given below:

    SelectionSort(A)
    Input: an unsorted array, A
    Postcondition: A is sorted in ascending order

        for s = 0 to length(A) - 2
    	m = s
            for i = s+1 to length(A) - 1
                if A[i] < A[m]
    	        m = i
    	swap(A, s, m)
    

    We now go through the beginning of an example. The current minimum is drawn in green, the sorted part is in blue, and the element we are currently examining is in red.

    Our current smallest element is at index 0. We will be examining all of the other elements in the array against this element (or, whatever the current smallest is).

    25 4 19 27 1 13 28 29

    We see that we have a new "smallest" element, so we change our index to be 1 and continue with our comparisons.

    25 4 19 27 1 13 28 29

    25 4 19 27 1 13 28 29

    25 4 19 27 1 13 28 29

    We have now found the smallest element in the array, but we can not be sure until we actually check all of the other elements in the array.

    25 4 19 27 1 13 28 29

    25 4 19 27 1 13 28 29

    25 4 19 27 1 13 28 29

    At this point, we have found the smallest element in the array, so we swap it into the correct spot:

    1 4 19 27 25 13 28 29

    We now do another scan to find the next smallest element:
    1 4 19 27 25 13 28 29

    We stop the demonstartion there, but you can see the full version here.

    Time Analysis: Once again, we have to define what an "operation" is. We use the same definition that we did with bubble sort: comparisons and swaps. This time we have no "worst case": the running time will be similar no matter the input. (I should point out that fewer changes to the current "smallest" index will be made when the array is sorted than if the array is reverse sorted). The first time we go through the loop looking for the smallest, we make (n-1) comparisons. The secod time we make (n-2), and so on. Thus, the total number of comparisons are:

    (n-1) + (n-2) + . . . + 2 + 1 = n*(n-1)/2

    However, with selection sort, it is only necessary to make one swap, at the end of each selection scan.

    When we combine these two factors, we obtain a total running time of:

    n*(n-1)/2 + (n-1) = O(n2)

    Thus, though we made fewer swaps in selection sort than we did in bubble sort, we still end up with a similar order for the algorithm.

Insertion Sort

    Our next sorting technique is known as insertion sort. It also breaks our array into two parts: a completely sorted half on the left and a completely unsorted half on the right. We then grab the first element on the right and move it into the correct position on the left by shifting all elements of the left portion over until we can place the value in its correct position. This technique is also formalized below.

    insertionSort(A)
    Input: an unsorted array, A
    Postcondition: A is sorted in ascending order

        for j = 1 to length(A) - 1
            key = A[j]
            i = j - 1
            while i >= 0 && A[i] > key
                A[i+1] = A[i]
                i--
            A[i+1] = key
    
    I show an example below. The blue shows the sorted part, and the red shows the comparison that is currently being made.

    29 4 1 25 19 27 13 28 Key: 4

    4 29 1 25 19 27 13 28 Key: 4

    4 29 1 25 19 27 13 28 Key: 1

    4 29 29 25 19 27 13 28 Key: 1

    1 4 29 25 19 27 13 28 Key: 1

    I stop there, but we still have 5 passes to make. You can see the full version here.

    A good question: This may seem like an inefficient way to figure out where to make our insertions: we know of a method that figures out where an element in a sorted list belongs. Remember binary search? We could modify that code slightly to figure out where to insert our key. This should be a lot faster than this linear comparison. However, once we figured out where to put it, we would still have to do all of the shifting. So, we wouldn't gain anything by performing the search. Indeed, the net effect would make the algorithm run slightly longer because of searching.

    Time Analysis: Now we wish to find how long it takes to perform insertion sort. We again begin by defining our operations. In insertion sort, we make comparisons and shift elements over. Each of these operations takes O(1) time. So, how long does the whole process take? We compare two cases, the best case and the worst case:

    In the best case, we never have to perform any shifts: the list is already in sorted order. Thus, we need to only make one comparison at a time to verify that the element is in the correct spot. Of course, we have to make these comparisons a total of n-1 times, meaning that at best insertion sort is O(n).

    In the worst case, we have to make as many shifts as possible. This happens when the array is in reverse sorted order. In this situation, if there are i elements in the sorted portion of our array, we will have to make i comparisons, and perform i shifts. Thus, a single insert will take a total of 2i time. To perform all n-1 inserts, we require:

    2 + 4 + . . . + 2*(n-1) = 2*(1 + 2 + . . . + (n-1)) = 2*(n*(n-1))/2 = n*(n-1)
    Thus, in the worst case, we can still perform our insertion in O(n2) time, which is no worse than bubble sort.

    So, what is the average running time to perform insertion sort? Is it O(n), O(n2), or somewhere in between? Well, in the "average" case, we will have to insert our element at the midpoint of the list. That is, if we have i elements in the sorted part of the array, we will have to make i/2 comparisons, and i/2 shifts, or a total of i operations. Thus, for all n-1 insertions we have to perform, our average time will be:

    1 + 2 + . . . + n-1 = n*(n-1)/2
    which is still O(n2).

Merge Sort

    Before I talk about merge sort, lets first make an observation: if we have two sorted arrays of size n/2, it is possible to combine these into a sorted array of size n in O(n) time. This is a simple operation: we start on the "left end" (index 0) of each array. We then copy the smaller element of the two into our final array, and compare the next set of elements. We continue this until one of our arrays has no more elements in it. We then copy the remaining elements of the other array into our sorted array. This whole process is known as a "merge". Note that in the trivial case of merging two arrays of size 1, there will be only one comparison to make. This process is formalized and demonstrated below:

    merge(left_array, right_array)
    Input: two sorted arrays
    Returns: A single sorted array whose elements were all contained in left_array and right_array

        left_index = 0
        right_index = 0
        merged_index = 0
    
        while (left_index < length(left_array) &&(right_index < length(right_array)
            if left_array[left_index] <= right_array[right_index]
                merged_array[merged_index++] = left_array[left_index++]
            else
                merged_array[merged_index++] = right_array[right_index++]
    
        // Copy all of the remaining left_array into the merged_array
        while left_index < length(left_array)
            merged_array[merged_index++] = left_array[left_index++]
    
        // Copy all of the remaining right_array into the merged_array
        while right_index < length(right_atrray)
            merged_array[merged_index++] = right_array[right_index++]
    
    2 3 5 7
    1 6 8 9
             

    2 3 5 7
    1 6 8 9
    1        

    2 3 5 7
    1 6 8 9
    12       

    2 3 5 7
    1 6 8 9
    123      

    2 3 5 7
    1 6 8 9
    1235     

    2 3 5 7
    1 6 8 9
    1235 6   

    2 3 5 7
    1 6 8 9
    1235 67  

    2 3 5 7
    1 6 8 9
    1235 678 

    2 3 5 7
    1 6 8 9
    1235 6789

    My brother and I often performed something like this when we used to sort our baseball card collection when we were younger. If we had a stack of cards that needed to get sorted, he would give me half for me to sort. He would sort his half, and then we would merge our two parts together. We noticed that in general this technique was faster than one of us sorting the stack by ourselves.

    We then had a great idea: if two of us can sort faster than one of us on our own, than four of us should be able to go even faster. Both he and I would have a friend come over. After he divided the stack and gave me half, he would give half of his stack to his friend, and I would give half of mine to my friend. After each of us sorted our quarter of the whole stack, my friend and I would merge our stacks together, my brother and his friend would merge their stacks together. Then, my brother and I had a stack which we would then merge together.

    We have even more friends come over and further divide our stacks, until we weren't able to get the stacks any smaller. This whole process was considerably faster than having one of us perform the sort by ourselves.

    There is a special name for the technique I described above: merge sort. It is a recursive algorithm: break an array in half, merge sort each half, then merge them together. The base case is when our array has one or fewer elements in it: an array of this type is obviously already sorted. We have the pseudocode below.

    mergeSort(A, left, right)
    Input: An unsorted array
    Returns: An array with A's elements in ascending order

        if (length(A) <= 1) 
            return A
        center = (length(A)-1) / 2
        unsorted_left = copyArray(A, 0, center)
        unsorted_right = copyArray(A, center+1, length(A)-1)
        sorted_left = mergeSort(unsorted_left)
        sorted_right = mergeSort(unsorted_right)
        return merge(sorted_left, sorted_right)
    
    Time Analysis: We start by breaking our array in half. This actually takes O(n) time because of all of the copying we have to do. We then sort each half, and then merge them together, which also takes O(n) time. Suppose the total amount of time to sort an array is T(n). Then, performing merge sort will take:
    T(n) = O(n/2) +     // Copying left half
           O(n/2) +     // Copying right half
           T(n/2) +     // Sorting left half
           T(n/2) +     // Sorting right half
           O(n)         // Merging our halves together
    
         = 2*O(n/2) + O(n) + 2*T(n/2)
         = 2*O(n) + 2*T(n/2)
    
    I used the general observation that 2*O(n/2) = O(n). We could expand T(n/2)in a similar manner, simplifying T(n) to be:
    T(n) = 2*O(n) + 2*(2*O(n/2) + 2*T(n/4))
         = 2*O(n) + 2*(O(n) + 2*T(n/4))
         = 2*O(n) + (2*O(n) + 4*T(n/4))
    
    We could continue in this manner and we would see:
    T(n) = 2*O(n) + 2*O(n) + (2*O(n) + 8*T(n/8))
    
    until eventually we have x of these 2*O(n) terms and a term equal to x*T(1):
    T(n) = 2*O(n) + 2*O(n) + 2*O(n) + . . . + 2*O(n) + x*T(1)
    
    So what is this term "x"? It is the number of times we can keep dividing our problem in half. We have a term for this: the logarithm, base 2. That is,
    x = log2n
    Thus:
    T(n) = 2*x*O(n) + x*T(1)
         = 2*log2n*O(n) + log2n*T(1)
         = 2*O(n*log2n) + log2n*T(1)
    
    The above simplifies to O(n*log2n). Instead, we quite frequently write this as: O(n log n).

    The book's technique vs. the above technique: You may have noticed that the book gives a version of merge sort which differs slightly from the one I give: the book uses a temporary array and end points to keep track of what part of the array we are currently sorting. The techniques are very similar: they just create one array into which all of the merging goes, while I create two whole new arrays when I broke the array in half. Their version is faster and takes less space: there is some overhead in creating all of these arrays, and my version takes more memory (it takes n * logn memory locations versus the 2*n the book's takes). So, strictly speaking the version the book uses is better than mine, at least performance wise. So why did I give my version instead of the book's? I think my way is a lot more intuitive. However, once you understand mine, you should look at the book's version and see how they differ and why the book's is better. It will help for the next sorting we are going to take about.

Quick Sort

Choosing a sorting technique

    So, which sorting technique is the best? Well, it depends on what you are looking for. Some people only care about how long it takes to run the algorithm, others care about how long it takes to implement, and other care about completely different things, like:

    Stable sorting: Stability has to do with sorting algorithms deal with duplicate values in our array. A stable algorithm is one which keeps these elements in the same relative order: if an element was further to the left of a duplicate value, when we are done sorting these element should still be to the left.

    Of the four algorithms we examined, bubble, insertion, and merge are stable, while quick sort is not.

    In place: An algorithm is "in place" if it requires no more memory than the initial array (not counting a temporary holder for swapping values, and the extra memory needed in recursive calls). The in place algorithms we examined are bubble, insertion, and quick sort, while merge sort is not. An algorithm being in place was a big deal long ago when memory was expensive. It is still fairly important nowadays, but only if you are sorting very large data sets.

    I summarize the advantages and disadvantages of each algorithm in the table below:

    AlgorithmBest caseWorst Case StableIn place
    Bubble Sorted: O(n2) Reverse sorted:O(n2) Yes Yes
    Insertion Sorted: O(n) Reverse sorted: O(n2) Yes Yes
    Merge None: O(n logn) None: O(n logn) Yes No
    Quick Median partitioning: O(n logn) Sorted: O(n2) No Yes

    I will leave it to you to think of situations where any single sorting algorithm is more desirable than any other.


© 2000 Michael Wade
All rights reserved