Project 4: Intro to Threads

Important Dates

Questions about the project? Send them to 537-help@cs.wisc.edu (and not just to remzi; this way, remzi OR the TAs can see it).

Due: Thursday, 3/25, by 9pm.

Clarifications

03/21: Don't worry about handling non-unique keys -- that is the user's problem
03/21: Scale the number of threads, not lists (fixed below)
03/13: Added proper header file to xchg code below
03/13: Changed counter definition to be more standard
03/13: Added header files in ~cs537-1/public/p4

Notes

This project can be done with a partner. So work with one! But, you don't have to. That said, if you can't find a partner, but want one, send me mail by the end of the weekend.

Overview

In this project, you will be getting a feel for threads, locks, and performance. The first entity you will build is called a spin lock. A spin lock uses some kind of powerful hardware instruction in order to provide mutual exclusion among threads. You may of even heard of spin locks, say in class or something.

Part 1: Spin Locks

To build a spin lock, you will use the x86 exchange primitive. As this is an assembly instruction, you will need be able to call it from C. Fortunately, gcc conveniently lets you do this without too much trouble: xchg.c

For those interested in learning more about calling assembly from C with gcc, see here.

To learn more about this instruction, you should read about it in the Intel assembly instruction manual found here. However, I bet you can figure it out without looking.

The lock you build should define a spinlock_t data structure, which contains any values needed to build your lock, and two routines:

spinlock_acquire(spinlock_t *lock)
spinlock_release(spinlock_t *lock)

These routine(s) should use the xchg code above as needed to build your spin lock.

Part 2: Using Your Lock

Next, you will use your spinlock to build three concurrent data structures. The three data structures you will build are a thread-safe counter, list, and hash table.

To build the counter, you should implement the following code:

void Counter_Init(counter_t *c, int value);
int Counter_GetValue(counter_t *c);
void Counter_Increment(counter_t *c);
void Counter_Decrement(counter_t *c);

You will make these routines available a shared library, so that multithreaded programs can easily update a shared counter. The library will be called libcounter.so

To build the list, you should implement the following routines:

void List_Init(list_t *list)
void List_Insert(list_t *list, void *element, unsigned int key)
void List_Delete(list_t *list, unsigned int key)
void *List_Lookup(list_t *list, unsigned int key)

The routines do the obvious things. The structure list_t should contain whatever is needed to manage the list (including a lock). Don't do anything fancy; just a simple insert-at-head list would be fine. This library will be called liblist.so

To build the hash table, you should implement the following code:

void Hash_Init(hash_t *hash, int buckets)
void Hash_Insert(hash_t *hash, void *element, unsigned int key)
void Hash_Delete(hash_t *hash, unsigned int key)
void *Hash_Lookup(hash_t *hash, unsigned int key)

The only difference from the list interface is that the user can specify the number of buckets in the hash table. Each bucket should basically contain a list upon which to store elements. This library will be called


libhash.so

The hash table should simply use one list per bucket. How can you make sure to allow as much concurrency as possible during accesses to the hash table?

Part 3: Comparing Performance

Finally, you will do some performance comparisons. Specifically, you will compare the performance of your spin lock versus the performance of pthread locks . You will do this for each of your data structures (counter, list, hash table).

Along the x-axis of each graph, you will vary the number of threads contending for the data structure. The y-axis will plot how long it took all of the threads to finish running.

The output of the comparison will be a graph, which might look like this:

This graph plots the average performance of many threads updating a shared counter, as in the following code that each thread would call Counter_Increment(&counter) in a loop max times.

For this experiment, max was set to 1000000, and each line shows the performance of either using your own lock or a pthreads lock inside the counter library.

Similar plots should be made for:

A list-insertion test - where you have threads each insert say 1e6 or 10e6 items into a list (and scale the number of ~~lists~~ threads)
A hash-table insertion test - same as above but with a hash table with a reasonable bucket size
A hash-table scaling test - this test should fix the number of threads (say at 20) and vary the number of buckets

Timing should be done with gettimeofday() . Read the man page for details. One thing that is good to do: write a wrapper which returns the time in seconds as a floating-point value. This makes the timing routine really easy to use.

To make your graph less noisy, you will have to run multiple iterations, as well as to make sure to let each experiment run long enough so as to be meaningful. You might then plot the average (as done above) and even a standard deviation or 95% confidence interval. A little statistics can go a long way...

Bonus

You might not be happy with the performance of your simple spin lock. If so, you should look into using the Linux futex() primitive to put threads to sleep when there is contention. How much does this improve performance? Make an awesome graph and show it to me.

Handing It In

This is p4. I bet you know how to turn stuff in by now, no? Do make sure your Makefile builds each of the libraries (counter, hash, and list) and uses your own lock.

The one thing that is different: you should turn in the code into both partners handin directories (which makes grading easier).