ID | Sum of digits | Array index (sum of digits mod 10) |
9014638161 | 39 | 9 |
9103287648 | 48 | 8 |
4757414352 | 42 | 2 |
8377690440 | 48 | 8 |
9031397831 | 44 | 4 |
Note that we have a problem: both the second and the fourth ID have the same hash value (8). This is called a collision. How can we store both keys in array[8]? The answer is that we can make the array an array of linked lists, or an array of search trees, so that in case of collisions (if multiple keys have the same hash value), we can store multiple keys in the same place in the array. Assuming that we use linked lists, here's what the hashtable looks like after the 5 ID numbers given above have been inserted:
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9] +---------------------------------------+ | \ | \ | | \ | | \ | \ | \ | | | | \| \| | | \| | | \| \| \| | | | | +---------|-------|---------------|---|-+ | | | | v v v v 4757414352 9031397831 8377690440 9014638161 | | v 9103287648
Consider storing the names: George, Amy, Alan, and Sandy in a hashtable of size 10, using the hash function:
Two important questions are:
The answers will be discussed below. First, let's assume that the hashtable size is TABLE_SIZE, and that we have a hash function, and let's consider what the lookup, insert, and delete operations will do.
To look up a key k in a hashtable, all you have to do is compute k's
hash value (v = hash(k)), then see if k is in array[v].
As mentioned above, the array will contain linked lists, or possibly
search trees.
In either case, you should already know how to look for k.
The time for the lookup will be proportional to the time for the hash
function, plus the time to look for k in the data structure in array[v].
In the best case, when at most one key hashes to each location in the
table, the lookup in array[v] will be O(1).
In the worst case, all of the keys will hash to the same place.
In that case, if linked lists are used, the time for the lookup will be
O(N), where N is the number of values stored in the hashtable.
If a balanced search tree is used, the time will be O(log N).
Inserting a key k in a hashtable is similar to looking it up:
first, v = hash(k) is computed, then k is added to array[v].
If linked lists are used, k should be added at the front of the list
(since that can be done in constant time).
The time for insert is similar to the time for lookup: the sum of the
time for the hash function and the time to insert k into array[v].
However, if linked lists are used, the time to insert k into the array
will always be O(1) rather than O(N) in the worst case.
To delete a key k from a hashtable, v = hash(k) is computed, then
k is deleted from the linked-list / search tree in array[v].
The worst-case time is the same as for lookup, since the value has
to be found before it can be deleted.
The best size to choose for the hashtable will depend on the expected
number of values that will be stored, and how important space
consumption is (there will be a trade-off between the amount of space
used and the number of keys that hash to the same array index).
It is reasonable to use a table that is a bit larger than the expected
number of items (say 1.25 times the expected number).
If the number of items to be stored is not known, then you can always plan
to expand the hashtable whenever it gets too full.
Lookup, Insert, and Delete
Choosing the Hashtable Size