Non-Comparison Sorting Algorithms


Contents

Introduction

As we have mentioned, it can be proved that a sorting algorithm that involves comparing pairs of values can never have a worst-case time better than O(N log N), where N is the size of the array to be sorted. However, under some special conditions having to do with the values to be sorted, it is possible to design other kinds of sorting algorithms that have better worst-case times. Three such algorithms are discussed here:

  1. Bucket Sort
  2. Counting Sort
  3. Radix Sort

Bucket Sort

The special conditions required for bucket sort are:

  1. The values to be sorted are evenly distributed in some range min to max.
  2. It is possible to divide the range into N equal parts, each of size k.
  3. Given a value, it is possible to tell which part of the range it is in.
Note that conditions 2 and 3 hold for numeric values, but not necessarily for other kinds of values (like strings).

Here's the algorithm:

For example:

Array A:      +-------------------------------------------------+
(N=10)        | 46 | 12 |  1 | 73 | 50 | 92 | 88 | 23 | 30 | 66 |
values in     +-------------------------------------------------+
range 1 to 100

              +-------------------------------------------------+
Buckets array:|    |    |    |    |    |    |    |    |    |    |
              +-------------------------------------------------+
                ^     ^
		|     |
		|     will hold values in the range 11 - 20
		|
		will hold values in the range 1 - 10

              +-------------------------------------------------+
Buckets array | |  | |  | |  | \  | |  | \  | |  | |  | |  | |  |
after Step 1: +-|----|----|---------|---------|----|----|----|--+
                v    v    v         v         v    v    v    v
		1    12   23        46        66   73   88   92
		          |         |
			  v         v
			  30        50

Array A       +-------------------------------------------------+
after Step 2  |  1 | 12 | 23 | 30 | 46 | 50 | 66 | 73 | 88 | 92 |
               +-------------------------------------------------+

Time Requirements

So the total time is O(N) if the values are evenly distributed, and as bad as O(N2) if the values are not evenly distributed, and linked lists are used in the buckets array.


TEST YOURSELF #1

Question 1: What happens when the array is already sorted (what is the running time for bucket sort in that case)?

Question 2: Consider sorting an array of size N of strings, using a buckets array of size 26. The first bucket holds all strings starting with "a", the second holds all strings starting with "b", etc. What is the running time of this algorithm, assuming that the strings are evenly distributed (i.e., an equal number start with each letter of the alphabet) and that linked lists are used in the buckets array?

solution


Counting Sort

The special conditions required for counting sort are:

  1. The values to be sorted are integers in some range min to max. We'll call the number of values in the range (max-min+1) k.
  2. N >= k.

There are two versions of counting sort, depending on whether the goal is simply to sort N integers, or to sort N integers each of which has some associated information to be carried along with it.

Version 1

Here's the algorithm for the first (simpler) case:

For example:

Array A:      +-------------------------------+
(all values   | 9 | 2 | 2 | 9 | 3 | 4 | 2 | 5 |
in the range  +-------------------------------+
0 to 9)

               [0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
Aux array     +---------------------------------------+
after Step 1: | 0 | 0 | 3 | 1 | 1 | 1 | 0 | 0 | 0 | 2 |
              +---------------------------------------+

Array A       +-------------------------------+
after Step 2  | 2 | 2 | 2 | 3 | 4 | 5 | 9 | 9 |
              +-------------------------------+

Time Requirements

So the total time is O(k) + O(N) + O(k+N), which is O(k+N). Since we've assumed that N is at least as large as k, this is O(N). (Note that the algorithm still works if N is less than k, but then it isn't linear in N any more.)

Version 2

If the values in A need to be moved rather than overwritten (e.g., if the values in A include associated data as well as integer keys to be sorted), then the algorithm for counting sort is slightly more complicated, and a second auxiliary array tmp (of size N) is used to hold the sorted values (which can then be copied back into A). As for version 1, we start by filling an auxiliary array with the counts of how many times each key value occurs in A. The remaining steps are different. To simplify the explanation, we'll assume that the values in A are in the range 0 to max-1.

Time Requirements

So the total time is O(k) + O(N) + O(k) + O(N) + O(N), which is O(k+N). Again, since we've assumed that N is at least as large as k, this is O(N).


TEST YOURSELF #2

Assume that array A contains pairs of values: an integer key in the range 1 to 10, and an associated name. Also assume that A[0] holds the pair (1, Anne), and A[1] holds the pair (1, Sandy).

A sorting algorithm that preserves the original order of duplicate keys (e.g., for this example, that finishes with the pair (1, Anne) coming before the pair (1, Sandy)) is called a stable sort. Is counting sort as defined above a stable sort? If not, how could the given code be changed so that it is stable? What about the other sorting algorithms that were discussed previously (selection sort, insertion sort, merge sort, and quick sort) -- were the versions of those algorithms defined in the notes stable or non-stable?

solution


Radix Sort

The special conditions required for radix sort are:

  1. The values to be sorted are sequences of length k.
  2. Each item in the sequences is itself a value in some range min to max. (For example, the values to be sorted could be strings -- sequences of characters in the range 'a' to 'z', or could be integers -- sequences of digits in the range 0 to 9).
  3. N >= max-min+1.
To do a radix sort, simply sort on each sequence position in turn, from right to left, using a stable sort.

Here's an example in which the values to be sorted are strings of length 3:

         +-----------------------------------+
array A: | sat | can | cat | sip | cot | sap |
         +-----------------------------------+

after sorting    +-----------------------------------+
on the 3rd char: | can | sip | sap | sat | cat | cot |
                 +-----------------------------------+
after sorting    +-----------------------------------+
on the 2nd char: | can | sap | sat | cat | sip | cot |
                 +-----------------------------------+
after sorting    +-----------------------------------+
on the 1st char: | can | cat | cot | sap | sat | sip |
                 +-----------------------------------+
Note that if the sequences have different lengths, they should be padded with a value that comes before all other values, and the side on which they're padded depends on how those values are ordered. Strings should be padded on the right:

original string padded string
cat catA
coot coot
at atAA
it itAA

Integers should be padded on the left:

original value padded value
315 0315
1665 1665
17 0017
57 0057

Time Requirements

The total time is therefore O(k * N).