A6.1: Hashing

Announcements and Clarifications

8/2 - Implement only the Map interface.

Brief Description

Your task in this assignment is implement a simple hash table using separate chaining to resolve collisions.

Goals and Requirements

Implementation

The program will be run as follows:

%java Hashing input_file table_size hash_parameter

The first argument will contain an input file of strings to insert into the hash table as key (no value is specified so use null). Each string will be on a line by itself. The second argument is the size of the hash table's internal array, this will be a positive integer. The third parameter is the input to the hash function, it will be an integer.

Your hash table should implement both the Map and Dictionary interfaces as your book describes on pages 369 and 389 respectively. For collision handling use the separate chaining method. You may implement the chaining however you choose (linked list, array, ...) though some methods are easier than others.

You will use polynomial hash codes to hash the strings in the input file into your hash table. The parameter a is the third command line argument to the program. In order to get the appropriate index into your array mod the result of the polynomial hash by the size of your array.

The output of the program is several statistics:

Finally the program should output to the console the contents of the hash table. For example the full output of the program might look as follows (small.txt):
# keys: 4
# cells: 4
# non-empty cells: 3
# collisions: 1
max collisions: 1
total access time: 5
average acess time: 1.25
0: the
1:
2: car, moose
3: people 
This displays a hash table of size four, it would be the output produced by a hash table that had the key "the" in the first cell, and nothing in the second cell. In the third cell "car" is first and "moose" is chained second, the fourth cell contains only the key "people".

Data Sets

Here are a couple data sets to try your hash table on:

Questions to Answer

  1. What values for the statistics indicate that a hash is being effective?
  2. Your book suggests on page 376, that 33, 37, 39 and 41 are particularly good choices for the parameter a of the polynomial hash function. Confirm or refute their claim on both data sets provided by using your program to gather data. Include copies of whatever graphs or data you need to support your argument.
  3. Are polynomial hashes a good choice for our data sets? If not, what type of hash would be more appropriate?
  4. Suppose you were asked to implement linear probing instead of separate chaining for the collision handling scheme. For a fixed experiment (fixed data set, table size and hash parameter) how would the statistics change?
  5. The number 68891 is prime, using a fixed a (use one of the "good" a's from Question 2) experiment with table sizes at and around 68891. Does your experiment support the idea that the division method works better when the divisor is prime?

Commenting and Style

Handin

Please hand all necessary files into your handin directory in a subdirectory named Hashing. Your application class should be called Hashing.java. There should be exactly one class declared within each .java file. If your program does anything strange (bugs), awesome(extra features) or has a non-intuitive interface please include a file called README.txt which explain them. If there are bugs in your program but you do not describe them in your README you will lose more credit than if you had described them.

Hints