Project 6: Concurrent Hash Table

Important Dates

Questions about the project? Send them to 537-help@cs.wisc.edu .

Due Date: Friday, 5/15 (but you can turn it in earlier of course)

You can have a partner for this project.

Clarifications

Good values. For this project, we will pass only good values as inputs to your libraries. For example, we will always pass positive integers to the init functions, pass non-negative integers to Hash_Insert(), etc. Do not worry about malicious users. You can focus on the locking part and the main functionalities of your libraries.

Objectives

There are two objectives to this assignment:

  • Learn how to write a multi-threaded program on Linux.
  • Learn how to build multi-thread safe data structures in Linux.

Overview

When building a multi-threaded program, one has to make sure one's data structures are multi-thread safe. In this project, we will see what this means in actuality by building a simple MT-safe library. The library implements a hash table, an important data structure used in many multi-threaded applications as well as in the operating system itself.

Program Specifications

The library you should implement is a thread-safe hash table called libHash.so . In your previous CS courses (e.g. CS 367), you most likely have implemented a hash table for one thread. Inside the OS, hash tables are used a lot, and often accessed by more than one thread. Hence, it is good practice to see how you should make a hash table thread-safe.

Your hash table will store integers. Below are the interfaces that you should implement. As a point of comparison, a good implementation should only be around 100 lines of code or less. The code skeleton for this library has been provided in this directory:

~cs537-2/public/p6/Hash.c
~cs537-2/public/p6/Hash.h

  • void Hash_Init(int numbuckets): Initialize your hash table with the specified number of buckets. In addition, you also need to set up one lock for each bucket and one list per bucket . You should not use one lock for the whole hash table because there will be lots of contention and the performance of your library will be very poor. Try it! You should see the performance difference.
  • int Hash_Insert(int element): Insert a number element to the hash table. Return 0 if the number has been successfully added to the hash table, or -1 otherwise; it is OK if there are duplicates. The bucket that should be selected for a given element is: bucketNum = element % numbuckets . Each bucket should maintain a linked list of integers. It is up to you how you want to manage the list (e.g. it can be sorted or not).
  • int Hash_Lookup(int element): Lookup the value element . If it is in the hash table, return 0; otherwise, return -1.

Compiling with Threads

Compiling a multi-threaded program with POSIX threads on Linux requires two things. First, you need to include the pthread header file pthread.h in your code. Second, when compiling, you need to link with the pthread library -lpthread . That's about it.

For more information, look at this tutorial , one of many available on the web.

Testing your libraries

The main focus of this project is whether your library is thread-safe or not. You should not worry too much about malicious users. In other words, you should expect us to generally use your interface appropriately and provide reasonable parameters.

However, you should still catch error codes from any system or library calls you make (e.g. malloc). If you catch an unexpected error, simply print an error message and exit (thus killing the process).

Make your own simple test codes. Include the headers for the hash library stress it yourself. What are the corner cases? Are you stressing them enough?

Having one succesful test is not enough to guarantee that your library is thread-safe. Synchronization bugs are typically non-deterministic. Hence, play around with lots of parameters (e.g. number of threads, number of buckets, number of operations). Test your library as many times as you can.

Grading

We will release the tests as soon as possible.

More Reading

Check the manual for these calls: pthread_create, pthread_join, pthread_mutex_init, pthread_mutex_lock, pthread_mutex_unlock.

Read the Advanced UNIX Programming book: 11.1 intro, 11.2 thread concepts, 11.3 thread identification, 11.4 thread creation, 11.5 thread termination, 11.6 thread synchronization (up to Mutexes).

Handing in your Code

Hand in your source code and a README file. We will create a directory ~cs537-2/handin/NAME/p6/, where NAME is your login name.

You should copy all of your source files (*.c and *.h) and a Makefile to your p6 handin directory. Do not submit any .o files. When we run your Makefile, it should minimally build libHash.so.

If your program does not work perfectly, your README file should explain what does not work and the reason (if you know).

After the deadline for this project, you will be prevented from making any changes in these directory. Remember: No late projects will be accepted!