A6.1: Hashing
Announcements and Clarifications
8/2 - Implement only the Map interface.
Brief Description
Your task in this assignment is implement a simple hash table using separate chaining to resolve collisions.
Goals and Requirements
- Implement a simple hash table ADT, it should implement the Map interface.
- Compute some basic statistics about the hash table.
- Experiment a parameterised polynomial hash function.
Implementation
The program will be run as follows:
%java Hashing input_file table_size hash_parameter
The first argument will contain an input file of strings to insert into the hash table as key (no value is specified so use null). Each string will be on a line by itself. The second argument is the size of the hash table's internal array, this will be a positive integer. The third parameter is the input to the hash function, it will be an integer.
Your hash table should implement both the Map and Dictionary interfaces as your book describes on pages 369 and 389 respectively. For collision handling use the separate chaining method. You may implement the chaining however you choose (linked list, array, ...) though some methods are easier than others.
You will use polynomial hash codes to hash the strings in the input file into your hash table. The parameter a is the third command line argument to the program. In order to get the appropriate index into your array mod the result of the polynomial hash by the size of your array.
The output of the program is several statistics:
- Number of keys in the table
- Number of cells in the table
- Number of non-empty cells in the table
- Number of collisions after inserting all keys
- Maximum number of collisions at a single index
- Total access time - this is the sum of the access time for each key. The access time for an individual key is the number of times the hash is applied to the key plus the number of steps down the chain to reach the key.
- Average access time - this is the total access time divided by the number of keys.
# keys: 4 # cells: 4 # non-empty cells: 3 # collisions: 1 max collisions: 1 total access time: 5 average acess time: 1.25 0: the 1: 2: car, moose 3: peopleThis displays a hash table of size four, it would be the output produced by a hash table that had the key "the" in the first cell, and nothing in the second cell. In the third cell "car" is first and "moose" is chained second, the fourth cell contains only the key "people".
Data Sets
Here are a couple data sets to try your hash table on:
Questions to Answer
- What values for the statistics indicate that a hash is being effective?
- Your book suggests on page 376, that 33, 37, 39 and 41 are particularly good choices for the parameter a of the polynomial hash function. Confirm or refute their claim on both data sets provided by using your program to gather data. Include copies of whatever graphs or data you need to support your argument.
- Are polynomial hashes a good choice for our data sets? If not, what type of hash would be more appropriate?
- Suppose you were asked to implement linear probing instead of separate chaining for the collision handling scheme. For a fixed experiment (fixed data set, table size and hash parameter) how would the statistics change?
- The number 68891 is prime, using a fixed a (use one of the "good" a's from Question 2) experiment with table sizes at and around 68891. Does your experiment support the idea that the division method works better when the divisor is prime?
Commenting and Style
- Your program should be written in a style that makes it easy to read and understand.
- At the beginning of each .java file you should include a description of the class and how it interacts with the other parts of your program.
- You are not required to use javadoc style comments.
- If your code is doing something complex or non-standard please comment that portion heavily.
Handin
Please hand all necessary files into your handin directory in a subdirectory named Hashing. Your application class should be called Hashing.java. There should be exactly one class declared within each .java file. If your program does anything strange (bugs), awesome(extra features) or has a non-intuitive interface please include a file called README.txt which explain them. If there are bugs in your program but you do not describe them in your README you will lose more credit than if you had described them.
Hints
- Develop incremently
- Write hashing functionality
- Display the table
- Compute the statistics
- Write the hash function first and test it properly before you begin writing the hash table.
- Make up some small test cases on a very small table so that collisions are guaranteed and you can check your separate chaining code.
- Use the hash table code provided in the book as a starting point, they are not using separate chaining however.