Note: some of these sites have code for sorting algorithms, perhaps even C++ code. Please write your own code. In any case, for whatever information you do use to influence your sorting algorithm please put a comment to that effect at the header of the algorithm. For example, if you used code that I gave in class:
void selectionSort(...) // This code is based on the code provided by the instructorOR
void shellSort(...) // This code is based on the algorithm described by the instructorOR
void fooSort(...) // This code is based on the algorithm described in the book by so and soOR
void funkySort(...) // This code is based on code appearing at the URL: ...You get the idea.
Can we use hashing for sorting? Not in general case --- the items we hash on need not have total order semantics.
One of the things that makes hashing interesting is that it gives us efficient searching without requiring underlying sorting.
For example, effective hashes on strings need not involve the usual order on strings (lexicographical). with HASH_TABLE_SIZE = 1000 and hashViaShift (see notes on hashing), hashViaShift("cat")==596 and hashViaShift("act")==612. So, in our hash table, "cat" will come before "act" (despite that with normal string ordering, "act" < "cat"):
______ |______| 0 |______| 1 |______| | ... | |------| |"cat" | 596 |------| | ... | |------| |"act" | 612 |------| | ... | |______|
However, we can use a limited form of hashing to achieve sorting in cases where our hash function is _monotone_: if x < y then hash(x) < hash(y). In other words, a monotone hash function will place items with keys that have a relative "lower" ordering early in the table and items with relatively greater ordering later in the table.
As a very simple case, suppose we want to sort a list of positive integers. Let M be the max of those integers. We can create a boolean array B of Max elements, initially with each entry set to false.
This is the simplest kind of hash sort (the underlying hash function is the trivial identity function) known as a bucket, or distribution sort. Hash sorting is a generalization of that works as long as the hash function is monotone.
Hash sorting:
initialize hash table so all slots marked as vacant for(i=0; i < n; i++) { insert A[i] into table at location hash(A[i]) (mark slot as occupied) } for(i=0,j=0; j < n; i++) { if table[i] is occupied { A[j] = item stored at table[i] j++ } }
template <class Item> void insertionSort(Item A[], size_t n) { for (size_t i = 1; i < n; i++) { Item key = A[i]; int j = i-1; for(; j>=0 && A[j]>key; j--) swap(A[j],A[j+1]); A[j+1] = key; } }
Shell sort is a a generalization of insertion sort, sometimes known as diminishing increment sort.
If A is an array of N items, we say the array is k-sorted if for every valid index i such that i+k<N, A[i] <= A[i+k]
For example,
0 1 2 3 4 5 ________________________ A[] = | 5 | 1 | 2 | 5 | 3 | 9 | ------------------------is 3-sorted, since:
A[0] <= A[3] (5 <= 5) A[1] <= A[4] (1 <= 3) A[2] <= A[5] (2 <= 9)but it is not 2-sorted since
A[0] > A[2] (5 > 2)
Another way to think of it: A is k-sorted if for each i < k, the sequence A[i], A[i+k], A[i+2k], A[i+3k], ... A[i+jk] (where i+jk is < n, but i + (j+1)k is >= n) is sorted. (We can call such a sequence the i-k sequence of A) The 0-1 sequence is all of A. (And A is sorted if and only if it is 1-sorted.)
For the above example:
We can k-sort an array simply: for each i between 0 and k-1, use insertion sort to sort the i-k sequence.
So, we can 2-sort the above array (meaning sort 5,2,3 and sort 1,5,9) to arrive at:
0 1 2 3 4 5 ________________________ A[] = | 2 | 1 | 3 | 5 | 5 | 9 | ------------------------which is 2-sorted.
To finish the sort, we can 1-sort (just call insertion sort on the whole thing). This is much more efficient then it sounds because insertion sort performs well on nearly sorted data.
So, the idea of Shell sort is to repeatedly k-sort the array for a decreasing sequence of k's such that last time k=1 at which point the array is completely sorted.
Obviously if we use each k from n-1 down to 1, then this will work, but that is overkill. It suffices to use a geometrically decreasing sequence (i.e. divide k by two each time) to work. For example: k=n/2, k=n/4, ... , k=1
You can use variations on this sequence. For example: k=(n+1)/2, k=(n+1)/4, ...
You can get better performance if the sequence of values for k have no common factors (think about why).