Today : Fri, 03 Jul 20 .


CS537 Student Wiki


Writing

Homeworks

Sections

Projects

Project Discussion

edit SideBar

Project3

Page: PmWiki.Project3 - Last Modified : Tue, 23 Apr 13

Threads and Synchronization

Due date

The project is due Tuesday, November 24 at 9 pm

Update

  • Tuesday 11/24: A student pointed out that the function "strtok()" is not thread safe, meaning that if you use it in multiple threads at the same time bad things can happen. There is a version that is safe to use with threads, buy you have to pass an extra parameter.
    Here is a link to some documentation and an example
  • Monday 11/23: Line numbers. Ideally your code will number lines starting with 1 for the first line. If you start with line number zero, please document this in you readme.
  • Friday 11/20: clarifications from office hours:
    • The assignment asks you to make sure that threads shut down cleanly when the process exits. This is a low-priority portion of the assignment, so I would try to get other parts working first. While you may be tempted to use pthread_cancel, there are better ways of getting threads to terminate. There are two basic things you should do: (1) have a global variable called something like "shutdown" that is set to false (or zero) most of the time, but is set to true (or 1) when you want to shut down, and (2) make every thread that is waiting to be signaled, either with semaphores or condition variables, check to see if this flag is set when it returns from pthread_cond_wait() or sem_wait(). If the flag is set, the thread should exit rather than performing any work.
  • Friday, 11/20: Clarifications:
    • You should use the find command line utility to build a list of files that you pass into the program.
    • You should have at least three threads, one for each major component
    • The indexers will only work well on text files, but that is o.k. They should run correctly on other files.
    • If a search is entered before indexing completes, your program should just return results from the files it has indexed so far.
    • As a quick note, when you create threads, pass just the name of the function and not the function with parentheses:
      pthread_create(&mytid, NULL, my_func(), NULL) is wrong, as it will call my_func before creating the thread.
  • You will need to compile with the flags "-pthread" to link to the threading functions and "-lm" for the index code. Add these to the command line for gcc in your Makefile.
  • Sunday, 11/15: A student pointed out a problem in the provided index code. Please download a new copy of index.c
  • Thursday, 11/12: For extra credit, you can implement the project in Google's new language Go. You will have to implement your own index structure. As Go currently does not have a rich runtime library, I would encourage you to do the simple, but inefficient, thing of using a doubly linked list. The list of available containers is here

Threading resources

Here are links to a few useful pages

Project goals

The goals for this project are to:

  1. Get experience programming with threads.
  2. Get experience with thread synchronization.

This project will be done INDIVIDUALLY.

Desktop search

In this project, you will implement a very simple desktop search engine. Your code will scan files and add each word in each file to an index. The index contains can be searched by word and contains the file name and line number where the word shows up. The search interface allows a user to type in a single word, and spits back a list of files containing that word.

Your code will consist of three pieces:

  1. A file system scanner that reads in file names
  2. An indexer that find all the words in a file and add them to a hash table
  3. A search interface that allows you to type in a word and get back a list of files containing that word

Each of these components should run concurrently as a separate thread, so you can scan files and index them as you query the index.

File system scanner

The file system scanner can be quite simple: it just reads in a list of files to scan. A list of files in a directory can be generated with this command:

  find . -type f > list-of-files.txt

This fills in the file list-of-files.txt with all the files within and beneath the current directory. You can assume that file names will be less than 511 characters (defined as MAXPATH in the provided code).

Indexer

An indexer takes as input a file name, opens that file, and then reads all the words in the file. You can use @strtok@, separating on whitespace (spaces, tabs, newlines) to find words. For each word it finds, the indexer adds an entry to a hash table with the word and the file name/line number where it appears. The code should look something like this:

  FILE * file;
  file = fopen(filename, "r");
  while (!feof(file)) {
    int line_number = 0;
    char * word;
    fgets(buffer, buffer_len,file);
    word = strtok(buffer, " \n\t");
    while (word != NULL) {
     index_insert(word, line_number, file_name);
     word = strtok(NULL, " \n\t");
    }
    line_number = line_number+1;
  }
  fclose(file);

You need to built a data structure to pass file names from the file system scanner to the indexer. This is a producer-consumer type of problem. There can be more than one indexer running at the same time, so you may want a bounded buffer of file names for the indexers to read from.

Code for insert_word and other functions for the hash table are provided.

Search interface

The search interface reads strings from standard input and looks the words in the index. You can assume that the user must enter single words and do not need to do error checking on the input. You can also assume that the input will be less than 128 characters (so you need 130 characters to hold a newline and null terminator).

The search interface searches the index and prints out the words that it find in the following format: If the word is found, it prints:

  FOUND: file-name line-number

If not found, it prints:

  Word not found.

The program should exit when the user enters ctrl-D at the search interface (indicating end of file).

Code for searching the hashtable and returning a list of locations is provided.

Threading

You should write this code so that each component runs as a separate thread, and so that there can be multiple indexer threads. You should use pthreads for threading.

The main task you have for this problem is the synchronization: you must make sure that with threads adding to the hash table at the same time as searching it, there are no data races. Furthermore, you need to synchronize the scanning thread and the indexer threads.

You should use pthread locks to prevent data races, and pthread condition variables with locks to synchronize between threads.

The main thread of your program can be one of your threads. However, you should wait for all other threads to terminate before exiting your main thread.

Specification

The program should be started with the following command line:

  search-engine num-indexer-threads file-list

where num-indexer-threads is the number of threads running the indexer and file-list is the list of files for the file system scanner to use.

Provided code

Provided code for this project is in ~cs537-2/public/Projects/P3/index.c. This code consists of the data structure for the index. It is not multi-thread safe, so you must handling concurrency issues, for example by acquiring locks before invoking this code and releasing locks afterwards. Note that the sample code above does not include this synchronization.

The code is contained in index.c and the header file is index.h. Sample code is available in test.c

The code provides three functions:

  1. int init_index() call this when your program initializes.
  2. int insert_into_index(char * word, char * file_name, int line_number) adds an entry to the index for word word in file file_name at line number line_number. The function does not keep a copy of any of the strings, so you can free/reuse their memory.

index_search_results_t * find_in_index(char * word) searches for word@@ in the index and returns results

Results are returned as a data structure:

  typedef struct index_search_elem_s {
    char file_name[MAXPATH];
    int line_number;
  } index_search_elem_t;

  typedef struct index_search_results_s {
    int num_results;
    index_search_elem_t results[1];
  } index_search_results_t;

The sample code shows how to use this structure; you should make sure call free on the pointer returned from find_in_index().

Note: none of these functions use locks, so you must provide any locking or synchronization needed.

What to turn in

Please turn in your source code, a Makefile, and a readme with any details about the code we should know to @@~cs537-2/handin/<your-user-name>/p3

This page may have a more recent version on pmwiki.org: PmWiki:Project3, and a talk page: PmWiki:Project3-Talk.


Powered by PmWiki
Skin by CarlosAB

looks borrowed from http://haran.freeshell.org/oswd/sinorca
More skins here


PmWiki can't process your request

Cannot acquire lockfile

We are sorry for any inconvenience.

More information

Return to http://pages.cs.wisc.edu/~swift/classes/cs537-fa09/wiki/pmwiki.php