Sorting

Day 1, Day 2, Day 3

Introduction

₀

₁

We will discuss 5 algorithms that sort these elements, and compare and contrast them to see which algorithm may be best in a given situation.

Bubble Sort

One of the more popular sorting techniques is known as "bubble sort". If we have an array of unsorted elements, we start at the left end of the array and look at adjacent successive pairs of elements, swapping the two if appropriate. It is important to note that each element of the array (except the endpoints) belongs to two sets of pairs: one where it is the first element, and one where it is the second. After we have done this for all pairs, the result is that the largest element in the array to the far right position. That is, after the first pass on bubble sort, the largest element is where it belongs. We then keep making passes similar to the one just described, until the array ends up sorted. This description is formalized below:

bubbleSort(A)
Input: an unsorted array, A
Postcondition: A is sorted in ascending order

    for pass = 1 to length(A) - 1
        for j = 0 to length(A) - pass
            if A[j] > A[j+1]
                swap(A, j, j+1)

Suppose our unsorted array is:

4 25 1 29 19 27 13 28

We would begin by seeing if the first two elements require swapping (I draw the two we are comparing in red). Here, they do, resulting in:

4 25 1 29 19 27 13 28

Now, we compare the 29 and 1, and swap:

4 1 25 29 19 27 13 28

We continue in this fashion, until after the first pass we have:

4 1 25 19 27 13 28 29

Notice that the 29 is right where we want it.

We then begin our next pass:
4 1 25 19 27 13 28 29

The last set of comparisons we will have to make for this pass is:

1 4 19 25 13 27 28 29

Now, the last two elements are correct (actually, the last 3 are correct, but we won't be sure of that until the next pass).

We still have a total of five passes to make, but I won't bother with that here. Instead, you can see the full bubble sort. You should probably try it on your own first, though, to see if you get the same results.

Time analysis: How long does it take to perform bubble sort? First, we need to define what operations we are performing: in bubble sort we need only worry about making comparisons and making swaps. We assume that each of these operations takes O(1) time. Let us begin by assuming our list is completely sorted to begin with: thus, we never have to make swaps, but only need to perform comparisons. If we have n elements in our list, we will have to make n-1 comparisons on our first pass, n-2 comparisons on our second pass, and so on, until we make 1 comparison on our (n-1)^th pass. Thus, we add up these comparisons:

(n-1) + (n-2) + . . . + 1 = n*(n-1)/2

Thus, we see that bubble sort will be O(n²) on a sorted list.

The worst case for bubble sort is when we have to make a lot of swaps. We make the maximum number of swaps when the array is in reverse sorted order. But, we still make as many swaps as we do comparisons. Thus, our total running time for this case is:

2*(n-1) + 2*(n-2) + . . . + = 2*((n-1) + (n-2) + . . . + 1) = 2*n*(n-1)/2 = n*(n-1)

which is still O(n²).

Selection Sort

Intuitively, we keep two parts to our array to be sorted: a part on the left, which has all elements in sorted order, and a part on the right where the elements are not sorted. It is a feature of the right part of the array that the smallest element is larger than the largest element in the left. Initially, the size of the left part of the array is 0, while the right is of size n. We then scan the enture right part of the array to find (or, "select") the smallest element. We then swap this element to the beginning of the right part. We continue this process until we shrink the right-part of the array to size 0.

The psuedocode is given below:

SelectionSort(A)
Input: an unsorted array, A
Postcondition: A is sorted in ascending order

    for s = 0 to length(A) - 2
	m = s
        for i = s+1 to length(A) - 1
            if A[i] < A[m]
	        m = i
	swap(A, s, m)

We now go through the beginning of an example. The current minimum is drawn in green, the sorted part is in blue, and the element we are currently examining is in red.

Our current smallest element is at index 0. We will be examining all of the other elements in the array against this element (or, whatever the current smallest is).

25 4 19 27 1 13 28 29

We see that we have a new "smallest" element, so we change our index to be 1 and continue with our comparisons.

25 4 19 27 1 13 28 29

We have now found the smallest element in the array, but we can not be sure until we actually check all of the other elements in the array.

25 4 19 27 1 13 28 29

At this point, we have found the smallest element in the array, so we swap it into the correct spot:

1 4 19 27 25 13 28 29

We now do another scan to find the next smallest element:
1 4 19 27 25 13 28 29

We stop the demonstartion there, but you can see the full version here.

Time Analysis: Once again, we have to define what an "operation" is. We use the same definition that we did with bubble sort: comparisons and swaps. This time we have no "worst case": the running time will be similar no matter the input. (I should point out that fewer changes to the current "smallest" index will be made when the array is sorted than if the array is reverse sorted). The first time we go through the loop looking for the smallest, we make (n-1) comparisons. The secod time we make (n-2), and so on. Thus, the total number of comparisons are:

(n-1) + (n-2) + . . . + 2 + 1 = n*(n-1)/2

However, with selection sort, it is only necessary to make one swap, at the end of each selection scan.

When we combine these two factors, we obtain a total running time of:

n*(n-1)/2 + (n-1) = O(n²⁾

Thus, though we made fewer swaps in selection sort than we did in bubble sort, we still end up with a similar order for the algorithm.

Insertion Sort

insertionSort(A) Input: an unsorted array, A Postcondition: A is sorted in ascending order

    for j = 1 to length(A) - 1
        key = A[j]
        i = j - 1
        while i >= 0 && A[i] > key
            A[i+1] = A[i]
            i--
        A[i+1] = key

29 4 1 25 19 27 13 28 Key: 4

4 29 1 25 19 27 13 28 Key: 4

4 29 1 25 19 27 13 28 Key: 1

4 29 29 25 19 27 13 28 Key: 1

1 4 29 25 19 27 13 28 Key: 1

I stop there, but we still have 5 passes to make. You can see the full version here.

A good question: This may seem like an inefficient way to figure out where to make our insertions: we know of a method that figures out where an element in a sorted list belongs. Remember binary search? We could modify that code slightly to figure out where to insert our key. This should be a lot faster than this linear comparison. However, once we figured out where to put it, we would still have to do all of the shifting. So, we wouldn't gain anything by performing the search. Indeed, the net effect would make the algorithm run slightly longer because of searching.

Time Analysis: Now we wish to find how long it takes to perform insertion sort. We again begin by defining our operations. In insertion sort, we make comparisons and shift elements over. Each of these operations takes O(1) time. So, how long does the whole process take? We compare two cases, the best case and the worst case:

In the best case, we never have to perform any shifts: the list is already in sorted order. Thus, we need to only make one comparison at a time to verify that the element is in the correct spot. Of course, we have to make these comparisons a total of n-1 times, meaning that at best insertion sort is O(n).

In the worst case, we have to make as many shifts as possible. This happens when the array is in reverse sorted order. In this situation, if there are i elements in the sorted portion of our array, we will have to make i comparisons, and perform i shifts. Thus, a single insert will take a total of 2i time. To perform all n-1 inserts, we require:

2 + 4 + . . . + 2*(n-1) = 2*(1 + 2 + . . . + (n-1)) = 2*(n*(n-1))/2 = n*(n-1)

So, what is the average running time to perform insertion sort? Is it O(n), O(n²), or somewhere in between? Well, in the "average" case, we will have to insert our element at the midpoint of the list. That is, if we have i elements in the sorted part of the array, we will have to make i/2 comparisons, and i/2 shifts, or a total of i operations. Thus, for all n-1 insertions we have to perform, our average time will be:

1 + 2 + . . . + n-1 = n*(n-1)/2

Merge Sort

merge(left_array, right_array) Input: two sorted arrays Returns: A single sorted array whose elements were all contained in left_array and right_array

    left_index = 0
    right_index = 0
    merged_index = 0

    while (left_index < length(left_array) &&(right_index < length(right_array)
        if left_array[left_index] <= right_array[right_index]
            merged_array[merged_index++] = left_array[left_index++]
        else
            merged_array[merged_index++] = right_array[right_index++]

    // Copy all of the remaining left_array into the merged_array
    while left_index < length(left_array)
        merged_array[merged_index++] = left_array[left_index++]

    // Copy all of the remaining right_array into the merged_array
    while right_index < length(right_atrray)
        merged_array[merged_index++] = right_array[right_index++]

2	3	5	7
1	6	8	9

2 3 5 7

1 6 8 9
1

2 3 5 7

1 6 8 9
1 2

2 3 5 7

1 6 8 9
1 2 3

2 3 5 7

1 6 8 9
1 2 3 5

2 3 5 7

1 6 8 9
1 2 3 5 6

2 3 5 7

1 6 8 9
1 2 3 5 6 7

2 3 5 7

1 6 8 9
1 2 3 5 6 7 8

2 3 5 7

1 6 8 9
1 2 3 5 6 7 8 9

My brother and I often performed something like this when we used to sort our baseball card collection when we were younger. If we had a stack of cards that needed to get sorted, he would give me half for me to sort. He would sort his half, and then we would merge our two parts together. We noticed that in general this technique was faster than one of us sorting the stack by ourselves.

We then had a great idea: if two of us can sort faster than one of us on our own, than four of us should be able to go even faster. Both he and I would have a friend come over. After he divided the stack and gave me half, he would give half of his stack to his friend, and I would give half of mine to my friend. After each of us sorted our quarter of the whole stack, my friend and I would merge our stacks together, my brother and his friend would merge their stacks together. Then, my brother and I had a stack which we would then merge together.

We have even more friends come over and further divide our stacks, until we weren't able to get the stacks any smaller. This whole process was considerably faster than having one of us perform the sort by ourselves.

There is a special name for the technique I described above: merge sort. It is a recursive algorithm: break an array in half, merge sort each half, then merge them together. The base case is when our array has one or fewer elements in it: an array of this type is obviously already sorted. We have the pseudocode below.

mergeSort(A, left, right) Input: An unsorted array Returns: An array with A's elements in ascending order

    if (length(A) <= 1) 
        return A
    center = (length(A)-1) / 2
    unsorted_left = copyArray(A, 0, center)
    unsorted_right = copyArray(A, center+1, length(A)-1)
    sorted_left = mergeSort(unsorted_left)
    sorted_right = mergeSort(unsorted_right)
    return merge(sorted_left, sorted_right)

Time Analysis

T(n) = O(n/2) +     // Copying left half
       O(n/2) +     // Copying right half
       T(n/2) +     // Sorting left half
       T(n/2) +     // Sorting right half
       O(n)         // Merging our halves together

     = 2*O(n/2) + O(n) + 2*T(n/2)
     = 2*O(n) + 2*T(n/2)

T(n) = 2*O(n) + 2*(2*O(n/2) + 2*T(n/4))
     = 2*O(n) + 2*(O(n) + 2*T(n/4))
     = 2*O(n) + (2*O(n) + 4*T(n/4))

T(n) = 2*O(n) + 2*O(n) + (2*O(n) + 8*T(n/8))

T(n) = 2*O(n) + 2*O(n) + 2*O(n) + . . . + 2*O(n) + x*T(1)

x = log₂n

T(n) = 2*x*O(n) + x*T(1)
     = 2*log₂n*O(n) + log₂n*T(1)
     = 2*O(n*log₂n) + log₂n*T(1)

₂

The book's technique vs. the above technique: You may have noticed that the book gives a version of merge sort which differs slightly from the one I give: the book uses a temporary array and end points to keep track of what part of the array we are currently sorting. The techniques are very similar: they just create one array into which all of the merging goes, while I create two whole new arrays when I broke the array in half. Their version is faster and takes less space: there is some overhead in creating all of these arrays, and my version takes more memory (it takes n * logn memory locations versus the 2*n the book's takes). So, strictly speaking the version the book uses is better than mine, at least performance wise. So why did I give my version instead of the book's? I think my way is a lot more intuitive. However, once you understand mine, you should look at the book's version and see how they differ and why the book's is better. It will help for the next sorting we are going to take about.

Quick Sort

Our last sorting technique is similar to merge sort in that it is recursive: we break our array into parts and sort each part. However, what we do with this sorting technique, called quick sort, is significantly different. When dividing the array, set things up so that the largest element of one part is smaller than the smallest element of the other part. We then recursively sort each of these parts.

Suppose instead of sorting our baseball cards using merge sort, we had used quick sort. We needed a way divide our big stack into two smaller stacks, one for each of us to sort. There are something like 792 cards in a set of baseball cards. So, my brother gave me all of the cards whose numbers were less than 400, and he took all of the cards numbered 400 or greater. We would then sort each of our halves. Note that there is no need to perform merging when we were finished: all cards in my stack were smaller than all cards in his stack. So, we would just set them next to each other and be done with it. Assuming our division was good, this technique would not be any worse than merge sort: the division process would take O(n) time, and the combining would be O(1), whereas with merge sort the division was O(1), and the combining was O(n).

We could then do something similar to before: have friends come over and divide the work. I would divide my cards into blocks of 1 - 200 and 200 - 400. My friend would sort one, and I would sort the other. We would then combine them, and then combine this single stack with the result of what my brother and his friend did.

There were names for all of the important things that went on here: breaking a stack into two smaller stacks was known as partitioning, and the value we chose to use as the division point in partitioning is called the "pivot". The algorithm for quick sort is fairly easy: if we want to sort elements left through right of our array, chose some pivot value, partition the elements into parts, put the pivot value into the appropriate location, and then recursively sort each of our two parts.

The difficulty comes in describing the partitioning process. We begin by choosing an index into our array. The value at that index will be our pivot. We then swap the element at this location with the very last element of the portion of the array we want sorted. Then, we create a left reference into our array, which is initially set to the left most element of the part of the array we want sorted. We march this reference along, comparing the value at this reference to our pivot value. We keep incrementing this reference until the value we are examining is greater than the pivot value (and therefore belongs in the right portion of the array). We then stop marching this left reference along, and start moving a right reference: a value which starts at the right portion of the array and we decrement until we come across a value which is smaller than the pivot, and thus belongs in the left portion of the array. We then begin our marchings again, until the left and right references intersect. This is where our pivot goes, so we exchange this index and the last index.

It isn't really as bad as the description above. The main reason quick sort is complicated is because the partitioning may not create two perfectly equal halves. What makes it worse, is that we do not know how our array will divide: we don't know the size of the two portions. So, we just can't create two separate arrays like we did with merge sort. That is why it is necessary to use all of these "left references" and "right references". The technique is formalized below.

quickSort(A, left, right) Input: An unsorted array, A, left and right indices into A Postcondition: Elements of A between left and right are sorted

    if (left < right)
        middle = partition(A, left, right)
        quickSort(A, left, middle-1)
        quickSort(A, middle + 1, right)

partition(A, left, right) Input: An unsorted array, A, left and right indices into A Postcondition: creates a middle element such that for all indices l in [left, middle), A[l] <= A[middle] and for all indices r, (middle, right], A[r] >= A[middle] Returns: the middle index which satisfies above if right - left <= 1 if A[left > A[right] swap(array, left, right) return left pivot_position = choosePivot(array, left, right) pivot = A[pivot_position] l = left r = right swap(A, pivot_position, right) while l < r while A[l] <= pivot && l < right l++ while A[r] > pivot && r > left r-- if l < r swap(A, l, r) middle = l swap(A, middle, right) return middle choosePivot(A, left, right) Input: An unsorted array A, left and right indices into A Returns: The pivot position that will be used in quickSort Several possibilities: return left return random(left, right) return median(A[left], A[(left+right)/2], A[right])

I have a series which shows partitioning. It uses the left element as our pivot.

Time Analysis: It should be evident that we can partition an n-element array in O(n) time. So now the question becomes how many times do we have to partition? The answer is "it depends": it depends on how lucky we are in choosing our pivot.

Suppose we choose our pivot such that each time, we split our array directly in half. We will then partition each of these halves. We begin the time analysis below, again with T(n) denoting the total amount of time it takes to quicksort an n-element array:

T(n) = O(n) +     // Partitioning step
       T(n/2) +   // Quick sorting the left half
       T(n/2)     // Quick sorting the right half
     = O(n) + 2*T(n/2)

T(n) = O(n) + 2*(O(n/2) + 2*T(n/4))
     = O(n) + 2*O(n/2) + 4*T(n/4)
     = O(n) + O(n) + 4*T(n/4)

₂

Suppose we are unlucky though: suppose we break our array such that we put n-1 elements in the left and 1 element in the right. Then, our time becomes:

T(n) = O(n) +   // Partitioning step
       T(n-1) + // Quick sorting the left part
       T(1)   + // Quick sorting the right part

That is why we have to choose our pivot carefully: suppose we always just choose the right element of our array as the pivot. Well, if our array is already sorted, when we partition, we partition into two blocks: one with n-1 elements, and one with 0 elements (the pivot is already at the correct location). Ideally, what we want is to choose the value that is exactly the median of the array. Unfortunately, it is very hard to figure out what the median is when the array is unsorted. Instead, the recommended technique is to choose a "median of three": look at three values in our array, and choose the median of those three as our pivot. In that case, we are guaranteed that none of the parts of our array will be empty: we know at least one value is smaller, and another is larger. This does not guarantee we have a perfect split, but at least we avoid the worst case.

What is the running time of quick sort "on average"? The average time is O(n log n).

Choosing a sorting technique

Stable sorting: Stability has to do with sorting algorithms deal with duplicate values in our array. A stable algorithm is one which keeps these elements in the same relative order: if an element was further to the left of a duplicate value, when we are done sorting these element should still be to the left.

Of the four algorithms we examined, bubble, insertion, and merge are stable, while quick sort is not.

In place: An algorithm is "in place" if it requires no more memory than the initial array (not counting a temporary holder for swapping values, and the extra memory needed in recursive calls). The in place algorithms we examined are bubble, insertion, and quick sort, while merge sort is not. An algorithm being in place was a big deal long ago when memory was expensive. It is still fairly important nowadays, but only if you are sorting very large data sets.

I summarize the advantages and disadvantages of each algorithm in the table below:

Algorithm Best case Worst Case Stable In place

Bubble Sorted: O(n²) Reverse sorted:O(n²) Yes Yes

Insertion Sorted: O(n) Reverse sorted: O(n²) Yes Yes

Merge None: O(n logn) None: O(n logn) Yes No

Quick Median partitioning: O(n logn) Sorted: O(n²) No Yes

I will leave it to you to think of situations where any single sorting algorithm is more desirable than any other.