Project 4: Multithreaded Programming

Important Dates

Questions about the project? Send them to 537-help@cs.wisc.edu .

Final Deadline: Tuesday November 10th @ 12 midnight.

Objectives

There are two objectives to this assignment:

  • Learn how to write a multi-threaded program on Linux.
  • Learn how to build multi-thread safe data structures in Linux.

Overview

When building a multi-threaded program, one has to make sure one's data structures are multi-thread safe. In this project, we will see what this means in actuality by building three simple MT-safe libraries. The first is a simple counter that multiple threads can use. The second is a hash table, an important data structure used in many multi-threaded applications as well as in the operating system itself. Finally, the third is a data structure that tracks the recency of access to a set of objects; this structure is something like that would be needed to implement true LRU within an OS cache.

Program Specifications

You should create three libraries that implement thread-safe data structures. The first one is a simple counter, the second one is a hash table, and the third one is our recency tracking data structure.

libcounter.so: For the first library you should implement thread-safe increment and decrement procedures. Below are the interfaces that you should implement. The code skeleton for this library has been provided in this directory:

~cs537-1/public/p4/counter/counter.c

Always read the README file first.

  • void Counter_Init(): Initialize your counter to 0. In addition to that, you need to initialize a pthread mutex to make your counter thread-safe.
  • int Counter_GetValue(): Return the current value of the counter.
  • void Counter_Increment(): Increment the counter by one. To make this thread-safe, you should lock and unlock the mutex before and after the increment operation.
  • void Counter_Decrement(): Decrement the counter by one. Similar to increment, you should lock and unlock the mutex before and after the decrement operation.

libhash.so: The second library you should implement is a thread-safe hash table. In your previous CS courses (e.g. CS 367), you most likely have implemented a hash table for one thread. Inside the OS, hash tables are used a lot, and often accessed by more than one thread. Hence, it is good practice to see how you should make a hash table thread-safe.

Your hash table will store integers. Below are the interfaces that you should implement. For a benchmark, a good implementation should only be around 300 lines of code. The code skeleton for this library has been provided in this directory:

~cs537-1/public/p4/hash/hash.c

Always read the README file first.

  • void Hash_Init(int numOfBuckets): Initialize your hash table with the specified number of buckets. In addition, you also need to set up one lock for each bucket . You should not use one lock for the whole hash table because there will be lots of contention and the performance of your library will be very poor. Try it! You should see the performance difference.
  • int Hash_Insert(int x): Insert a number x to the hash table. Return -1 if x already exists in the hash table (and do not re-insert x to the hash table). Return 0 if x has been successfully added to the hash table. The bucket that should be selected for a given x is: bucketNum = x % numOfBuckets . Each bucket should maintain a linked list of integers. It is up to you how you want to manage the list (e.g. it can be sorted, it can also be unsorted).
  • int Hash_Remove(int x): Remove x from the hash table. Return -1 if x does not exist. Return 0 if x has been successfully removed.
  • int Hash_CountElements(): Count and return the number of elements in the hash table.
  • int Hash_CountBucketElements(int bucketNumber): Count and return the number of elements in the specified bucket.
  • void Hash_Dump(): Print the content of your hash table. The format of the printout is up to you. We will not grade this function.

liblru.so: The final library you should build implements a data structure to track a group of positive integers (values greater than zero) in LRU fashion. The interface is as follows:

  • void LRU_Init(int size): This should create a structure (or set of structures) that you will use to track the recency of access of up to size integers.
  • int LRU_Insert(int element): This inserts an integer into the LRU structure. If the structure is full (e.g., it has size elements in it already), you should first remove the least-recently-used LRU element; this is the value LRU_Insert() should return. The new element should be added at the MRU side of your data structure. If there are not yet size elements in the structure, LRU_Insert() should return 0. If the user tries to insert the value 0 or negative values, or if the user tries to insert a value that is already present in the structure, the routine should return -1 as an indication of failure.
  • int LRU_Access(int element): This routine simply looks up element in your data structure and moves it to the MRU side of your structure. If the element exists, this routine should return 0 when finished. If the element does not exist, it should return -1.
  • int LRU_Remove(int element): This routine removes the integer element from the structure. Returns 0 upon success and -1 upon failure.
  • int LRU_Size(): This routine simply returns the number of elements in your data structure.

The header for the code above can be found at:

~cs537-1/public/p4/lru/lru.h

Compiling with Threads

Compiling a multi-threaded program with POSIX threads on Linux requires two things. First, you need to include the pthread header file pthread.h in your code. Second, when compiling, you need to link with the pthread library -lpthread . That's about it.

For more information, look at this tutorial , one of many available on the web.

Testing your libraries

The main focus of this project is whether your library is thread-safe or not. You should not worry too much about malicious users. In other words, you should expect us to generally use your interface appropriately and provide reasonable parameters.

However, you should still catch error codes from any system calls you make (e.g. malloc). If you catch an unexpected error, simply print an error message and exit (thus killing the process).

So far we have created two complex test codes. They can be found in ~cs537-1/public/p4/counter/ and ~cs537-1/public/p4/hash/ . Please read the corresponding README files in these directories to find out how to run the test code. To exercise threads, you must run the test code in machine with more than one processor. On Linux, you can check this by opening the cpuinfo file (e.g. "less /proc/cpuinfo") to confirm that the machine you use is a multi-processors machine.

The test code basically runs lots of threads and operations and keeps track the final expected value (e.g. the expected number of elements in the hash table). The expected value is compared against the value from your library (e.g. return value of Hash_CountElements()). If they are the same, it is a reasonable indication that your library might be thread-safe.

There are more suggestions:

  • Make your own simple test codes. Include the headers for each of the libraries and stress them yourself. What are the corner cases? Are you stressing them enough?
  • Having one succesful test is not enough to guarantee that your library is thread-safe. Synchronization bugs are typically non-deterministic. Hence, play around with lots of parameters (e.g. number of threads, number of buckets, number of operations). Test your library many many times (more than 50 times if you have to).

Clarifications

No locks for some routines. You do not have to make these reader functions thread-safe: Counter_GetValue(), Hash_CountElements(), Hash_CountBucketElements(), and LRU_Size(). We only call these reader functions when all threads have finished. Hence, you should not worry about readers-writers lock because readers will be only called at the end. Simply use pthread_mutex_lock, and pthread_mutex_unlock.

Grading

We will run a bunch of tests. For each test, the expected value should be the same as the value given from your library. Details about each test will come out soon. However, they should be similar to the two test code we have provided.

We will also measure the performance of your library. In particular, to make sure you do not use one big lock for the whole hash table, and you should not just use one big list to implement the LRU list (think about how slow LRU_Access() would be if you had to search the whole list to update the recency ordering of elements).

More Reading

We should cover most of the stuff you need in the discussion sections. However, it is always nice to read more about the functions you will be using.

Check the manual for these calls: pthread_create, pthread_self, pthread_join, pthread_mutex_init, pthread_mutex_lock, pthread_mutex_unlock.

Read the Advanced UNIX Programming book: 11.1 intro, 11.2 thread concepts, 11.3 thread identification, 11.4 thread creation, 11.5 thread termination, 11.6 thread synchronization (up to Mutexes).

Handing in your Code

Hand in your source code and a README file. We will create a directory ~cs537-1/handin/NAME/p4/, where NAME is your login name.

You should copy all of your source files (*.c and *.h) and a Makefile to your p4 handin directory. Do not submit any .o files. When we run your Makefile, it should minimally build all the libraries.

If your program does not work perfectly, your README file should explain what does not work and the reason (if you know).

After the deadline for this project, you will be prevented from making any changes in these directory. Remember: No late projects will be accepted!