Non-Comparison Sorting Algorithms
As we have mentioned, it can be proved that a sorting algorithm that
involves comparing pairs of values can never have a worst-case time
better than O(N log N), where N is the size of the array to be sorted.
However, under some special conditions having to do with the values
to be sorted, it is possible to design other kinds of sorting algorithms
that have better worst-case times.
Three such algorithms are discussed here:
The special conditions required for bucket sort are:
Here's the algorithm:
For example:
Question 1:
What happens when the array is already sorted (what is the running time
for bucket sort in that case)?
Question 2:
Consider sorting an array of size N of strings, using a buckets array
of size 26.
The first bucket holds all strings starting with "a", the second holds
all strings starting with "b", etc.
What is the running time of this algorithm, assuming that the strings
are evenly distributed (i.e., an equal number start with each letter of the
alphabet) and that linked lists are used in the buckets array?
The special conditions required for counting sort are:
There are two versions of counting sort, depending on whether the goal is
simply to sort N integers, or to sort N integers each of which has some
associated information to be carried along with it.
For example:
Here's the code for this step:
Assume that array A contains pairs of values: an integer key in the
range 1 to 10, and an associated name.
Also assume that A[0] holds the pair (1, Anne), and A[1] holds the
pair (1, Sandy).
A sorting algorithm that preserves the original order of duplicate keys
(e.g., for this example, that finishes with the pair (1, Anne) coming before
the pair (1, Sandy)) is called a stable sort.
Is counting sort as defined above a stable sort?
If not, how could the given code be changed so that it is stable?
What about the other sorting algorithms that were discussed previously
(selection sort, insertion sort, merge sort, and quick sort) -- were the
versions of those algorithms defined in the notes stable or non-stable?
The special conditions required for radix sort are:
Here's an example in which the values to be sorted are strings of length 3:
Introduction
Bucket Sort
Note that conditions 2 and 3 hold for numeric values, but not necessarily
for other kinds of values (like strings).
Array A: +-------------------------------------------------+
(N=10) | 46 | 12 | 1 | 73 | 50 | 92 | 88 | 23 | 30 | 66 |
values in +-------------------------------------------------+
range 1 to 100
+-------------------------------------------------+
Buckets array:| | | | | | | | | | |
+-------------------------------------------------+
^ ^
| |
| will hold values in the range 11 - 20
|
will hold values in the range 1 - 10
+-------------------------------------------------+
Buckets array | | | | | | | \ | | | \ | | | | | | | | |
after Step 1: +-|----|----|---------|---------|----|----|----|--+
v v v v v v v v
1 12 23 46 66 73 88 92
| |
v v
30 50
Array A +-------------------------------------------------+
after Step 2 | 1 | 12 | 23 | 30 | 46 | 50 | 66 | 73 | 88 | 92 |
+-------------------------------------------------+
Time Requirements
So the total time is O(N) if the values are evenly distributed, and as
bad as O(N2) if the values are not evenly distributed, and
linked lists are used in the buckets array.
Counting Sort
Version 1
Here's the algorithm for the first (simpler) case:
Array A: +-------------------------------+
(all values | 9 | 2 | 2 | 9 | 3 | 4 | 2 | 5 |
in the range +-------------------------------+
0 to 9)
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
Aux array +---------------------------------------+
after Step 1: | 0 | 0 | 3 | 1 | 1 | 1 | 0 | 0 | 0 | 2 |
+---------------------------------------+
Array A +-------------------------------+
after Step 2 | 2 | 2 | 2 | 3 | 4 | 5 | 9 | 9 |
+-------------------------------+
Time Requirements
So the total time is O(k) + O(N) + O(k+N), which is O(k+N).
Since we've assumed that N is at least as large as k, this is O(N).
(Note that the algorithm still works if N is less than k, but then it
isn't linear in N any more.)
Version 2
If the values in A need to be moved rather than overwritten (e.g., if the
values in A include associated data as well as integer keys to be sorted),
then the algorithm for counting sort is slightly more complicated, and a
second auxiliary array tmp (of size N) is used to hold the sorted values (which
can then be copied back into A).
As for version 1, we start by filling an auxiliary array with the counts
of how many times each key value occurs in A.
The remaining steps are different.
To simplify the explanation, we'll assume that the values in A are in
the range 0 to max-1.
for (int j = 1; j < max; j++) {
aux[j] += aux[j-1];
}
Here's the same array A shown above, and the values of the
auxiliary array after Step 1 and Step 2:
Array A: +-------------------------------+
(all values | 9 | 2 | 2 | 9 | 3 | 4 | 2 | 5 |
in the range +-------------------------------+
0 to 9)
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
Aux array +---------------------------------------+
after Step 1: | 0 | 0 | 3 | 1 | 1 | 1 | 0 | 0 | 0 | 2 |
+---------------------------------------+
Aux array +---------------------------------------+
after Step 2: | 0 | 0 | 3 | 4 | 5 | 6 | 0 | 0 | 0 | 8 |
+---------------------------------------+
for (int j=0; j< max; j++) {
tmp[aux[A[j]]-1] = A[j];
aux[A[j]] --;
}
And here's what happens on the first three iterations of th loop
for our running example:
Array A: +-------------------------------+
(all values | 9 | 2 | 2 | 9 | 3 | 4 | 2 | 5 |
in the range +-------------------------------+
0 to 9)
Aux array +---------------------------------------+
after step | 0 | 0 | 3 | 4 | 5 | 6 | 0 | 0 | 0 | 8 |
2: +---------------------------------------+
FIRST ITERATION
---------------
j = 0
A[0] = 9
aux[9] = 8, so put a 9 in tmp[7] and decrement aux[9]
Aux array: +---------------------------------------+
| 0 | 0 | 3 | 4 | 5 | 6 | 0 | 0 | 0 | 7 |
+---------------------------------------+
Tmp array: +-------------------------------+
| | | | | | | | 9 |
+-------------------------------+
SECOND ITERATION
----------------
j = 1
A[1] = 2
aux[2] = 3, so put a 2 in tmp[2] and decrement aux[2]
Aux array: +---------------------------------------+
| 0 | 0 | 2 | 4 | 5 | 6 | 0 | 0 | 0 | 7 |
+---------------------------------------+
Tmp array: +-------------------------------+
| | | 2 | | | | | 9 |
+-------------------------------+
THIRD ITERATION
---------------
j = 2
A[2] = 2
aux[2] = 2, so put a 2 in tmp[1] and decrement aux[2]
Aux array: +---------------------------------------+
| 0 | 0 | 1 | 4 | 5 | 6 | 0 | 0 | 0 | 7 |
+---------------------------------------+
Tmp array: +-------------------------------+
| | 2 | 2 | | | | | 9 |
+-------------------------------+
Note that there is one more 2 in array A, and that 2 will be put into
tmp[0].
Time Requirements
So the total time is O(k) + O(N) + O(k) + O(N) + O(N), which is O(k+N).
Again, since we've assumed that N is at least as large as k, this is O(N).
Radix Sort
To do a radix sort, simply sort on each sequence position in turn,
from right to left, using a stable sort.
+-----------------------------------+
array A: | sat | can | cat | sip | cot | sap |
+-----------------------------------+
after sorting +-----------------------------------+
on the 3rd char: | can | sip | sap | sat | cat | cot |
+-----------------------------------+
after sorting +-----------------------------------+
on the 2nd char: | can | sap | sat | cat | sip | cot |
+-----------------------------------+
after sorting +-----------------------------------+
on the 1st char: | can | cat | cot | sap | sat | sip |
+-----------------------------------+
Note that if the sequences have different lengths, they should be padded
with a value that comes before all other values, and the side on
which they're padded depends on how those values are ordered.
Strings should be padded on the right:
original string | padded string |
cat
| cat A
|
coot
| coot
|
at
| at AA
|
it
| it AA
|
Integers should be padded on the left:
original value | padded value |
315
| 0315
|
1665
| 1665
|
17
| 0017
|
57
| 0057
|