Homework 3: From Crawling To Walking

Questions?

If you have a question, just send email to 354-help@cs.wisc.edu and we'll try to get back to you quickly. Don't worry, it's just to the TAs and professor.

How to send a good email: Put a lot of information in it! For example, cut and paste what you typed on the screen, and what was printed on the screen as a result. Don't just say that something didn't work! We know it didn't work already; no one sends mail saying that it's all going great.

Clarifications

If your program is taking too long on some of the measurements, it is OK to use a smaller data set size (i.e., one that takes tens of seconds or a few minutes).

If malloc() returns NULL, it is ok to just halt the program. Usually, this is achieved by using assert() , e.g., if the return value from malloc() is placed into variable p, just put assert(p != NULL); in your code. A more general list implementation might return -1 on error and 0 when everything is fine, but we're not doing that here.

Example makefile to build liblist.a here.

Relevant Book Chapters

Relevant reading is all from K+R . Particularly Chapters 5 and 6 (Pointers and Arrays, Structures).

Overview

The main purpose of this project is for you to write more C, gaining familiarity with basic libraries and simple data structures. You should also gain some experience with caring about the performance of the code you write via timing.

Due Date: Sunday Feb 26th at some time

Part 1: An Alternative Linked List Library

As you know, a linked list is a basic structure for storing data within programs. In class, we saw one way to construct such a list: every time a new node is inserted, call malloc() to allocate space for a new struct, and then link said struct into the list (either at the beginning, or the end, or in order, for example).

In this part of the project, you will implement the linked list in an alternative manner, using arrays. Specifically, your list will initially allocate some space for contents of the list as an array of node_t structures; then, subsequent inserts, deletes, etc., will simply move elements around the array as need be to function as a list. The header file you should use is here and shows what basic data structures you should use as well as which functions you need to implement.

The way this would work: assume there is a chunk size. Your array can grow and shrink by chunk_size * sizeof(node_t). When the list is initialized, allocate this much space and use it to hold contents of your list. A list insert, then, would just fill the 0th entry of the array. Subsequent inserts (assuming an insert at end) would fill up slots 1 ... chunk_size-1. At this point, however, the array is full, so what happens when the next insert takes place?

In this case, what you should do is make more space in your array. There are two ways to do this. The simplest would be to call realloc() with the next bigger size, 2 * chunk_size * sizeof(node_t); realloc() takes your existing chunk of heap space and either directly grows your allocation or finds a new space and copies the data that was in the original array there (thus freeing the old space). Read the man page (e.g., type man realloc ) to learn more.

The other option would be to simply call malloc() to get a bigger space, copy the existing array into that space, and then free() the original array. Either way is fine, but realloc() is likely to be more efficient. In this manner, your array will grow in chunks as more data is inserted.

Note that the chunk_size parameter is set when you call list_init().

Of course, you have to think about how to implement other operations. For example, how do you insert at the front of the list in an array? Or, how do you keep the array ordered? These operations require a bunch of shuffling around of the contents of the array.

You may also notice that with all the list operations in list.h that when you put something into the list, you not only specify a key (an integer) but also a value (which is of type void*). The void* value just allows the user of the list to insert any arbitrary pointer to something into the list to store along with the key; this is C's way of making a generic data type. You don't have to worry about what is in there; you just have to store and retrieve it as the interface demands. For example, the insert functions will put the key and value into a node_t and store it in the array; a subsequent lookup of that key will return the value (void*) associated with it.

Don't forget to also shrink your array as elements are deleted, eventually. Specifically, if you grow the array chunk_size at a time, you should also free space chunk_size at a time. Thus, if chunk_size were set to 100, your initial array would be size 100. When the 101st element is allocated, your array is grown to size 200. And so forth. However, when an element is deleted, you may have to shrink the array. For example, if the 101st element is deleted and the array now contains 100 elements, the array size should now be 100 (not 200). Note that the array size should always have at least one chunk (and thus is never zero-sized).

To implement this, you should use realloc() again (or with malloc, copy, and free, as desired).

In building this list, you should keep the function prototypes in list.h unchanged (although it is OK to add more information to the list_t structure). You should put your C code into a list.c file, and then compile your list into a statically-linked library called liblist.a as we did in class. You should also provide a makefile called makefile that has the rules in it needed to build your library. Finally, you should build your code with optimization turned on, i.e., use the -O (dash capital O) flag.

Some Other Details

Here are some other details relevant to your list:

The array should generally be kept densely packed. That is, when inserting something at the front of the list, you will have to move the current entries up a spot and then insert the new entry into the 0th slot. Similarly, when you delete something in the middle, you should move all the things beyond that point back one spot.
List insertion at the end of the list should be very fast. This means you should know where the next currently available slot is at the end of the array.
Ordered insert should be ordered in an ascending fashion. Specifically, 0 should come before 1 should come before 2, etc.
You should not sort the list or do any extra work to support ordered insert. Specifically, the list may not be ordered (e.g., you may have inserted something with one of the other methods), but someone may still call ordered insert upon it. In this case, it is OK to just search for the first entry that is greater than the to-be-inserted key and insert something in front of it. It is not the job of the list to sort the data in some way other than this.
Lookup/Delete returns -1 if key not found, otherwise 0. -1 is a common indication of failure, and 0 of success, so we follow this convention here.

Part 2: Timing Your List

The last part of this project is to use timing to measure the performance of your list, as compared to the standard malloc-based linked list we saw in class. Thus, you'll have to get that working as well to perform these experiments.

In this part of the project, you should create the following graphs, with the names shown in italics:

front.pdf - Time for insert at front: Compare the standard list and your list under insert at front of list. Vary the number of insertions your perform from 100,000 to 10,000,000 along the x-axis, and show the time it takes to perform those insertions along the y-axis (in milliseconds). You should take measurements at every multiple of 100,000. Chunk size should be set to 1000.
back.pdf - Time for insert at back: Compare the standard list and your list under insert at back of list, following the same guidelines.
delete.pdf - Time for insert/delete: First insert N elements, then delete all N of them, and time just the deletes. Vary N as above from 100,000 to 10,000,000 and compare your array-based implementation vs. the standard malloc-based linked list.
lookup.pdf - Time random lookups: Insert 10,000,000 unique elements into a list, and then time how long it takes to perform random lookups of N elements, varying N from 1,000 to 10,000 at increments of 1,000. Do this for both types of list and plot the results.
chunk.pdf - Time insertions with different chunk sizes. Your array-based list grows (and shrinks) in chunks; vary the chunk size while repeating the insert-at-back test from above. This time, pick a fixed number of insertions (10,000,000) and vary the chunk size along the x-axis from 1000 up to 10,000 by 1000s.

Each of these graphs should be a single PDF that shows clearly labeled X and Y axes.

For this part of the project, you will have to use gettimeofday() in a main program to time the thing you're trying to time, and random() get find random keys to search for in the last part.

Knowing how to write a little shell script may be very useful for this part of the project; spend some time learning how to write a csh or bash script, or even Python, to launch a bunch of experiments and gather the results.

You should also think about this: what did you expect from the results? What surprised you? Do you have any explanation for the relative performance of these two different approaches? Put your thoughts into a README file for this project. Did the timing help you decide on what chunk size your list should grow and shrink by?

Testing

An important part of programming is testing your code to make sure it works. This means writing more code usually!

Write a main.c (much like we did in class) that can be used to test your library. Make it easy to insert and delete and push through all the corner cases you can think of.

For this project, you must use flags -Wall and -Werror when compiling; failure to do so will be a problem, so don't forget!

Handing It In

To hand in these programs, you just have to put the C source code files and makefiles and graphs into your handin directory, under the subdirectory hw3/ (naturally).

At the end of putting everything into your directory, you should check that all the files are there:

prompt> ls -l ~cs354-3/handin/remzi/hw3/

The program ls lists files in a directory and thus should show all the above source files therein.