A4 Sorting!

Announcements and Clarifications

7/13 - Due data changed to Fri 7/14 @ 5pm

Brief Description

Your task in this assignment to is implement several different types of sorts and to experiment with sorting several different data sets.

Goals and Requirements

Description

Your first goal is to implement a list ADT, you may implement this data structure however you choose, but consider the running time of various operations that are necessary for sorting.

You should implement two versions of bubble sort, one that uses the list ADT you implemented and one that uses Java's ArrayList class. The reminder of your sort should all use the same data structure to store the list of things to sort. You must also implements two versions of quick sort: the first is the heuristic where you choose the first element as the pivot and the second should choose the median value as the pivot using the median finding algorithm. You also need to implement a mode that reads in the data to sort but does not perform any operations.

Your sorts should always order the elements in increasing lexicographic order.

Implementation

All of the user input will be as command line arguments:

java Sorts mode_number input_file output_file amount_of_data ...

The mode_number indicates which operation will be performed:

  1. Bubble sort using ArrayList
  2. Bubble sort using your list class
  3. Insertion sort
  4. Merge sort
  5. Quick sort picking first element as pivot
  6. Quick sort using the median method to choose pivot
  7. Sort using the Arrays.sort() method
  8. No-op, reads in data but does no sorting
  9. Binary search

For all modes input_file indicates the file to read the data from. For the binary search mode, assume that the input file contains data already in sorted order.

For all modes output_file indicates the name of the file to output the results to. For the sorts, the only output to the file will be a list of the data in sorted order, one piece of data per line. For the binary search mode, the only output should be the pivot element in each step of the algorithm on its own line, if item is found the last line should be that item otherwise the last line should be "NOT FOUND".

For all modes amount_of_data indicates the number tokens to read from the data file. Use Scanner.next() to read in the tokens.

For the binary search mode there will be another command line argument that specifics the token to search for. Your implementation of the binary search should be recursive.

No-op mode will read in the data from input_file, place it in the ArrayList and then output the data to output_file in the one token per line format.

If there are any I/O problems with the input file or the output file indicate that error on standard error and exit the program gracefully. If there are fewer tokens in the input file than are requested to be read in, indicate an error and exit the program.

Data sets

We have several data sets for you to use. Consider all of the data tokens as Strings when sorting. Use the standard Scanner.next() to tokenize the data with the default separators.

The data sets are human readable, look through them before you start coding.

Questions to Answer

At the end of your README.txt file include answers to the following questions:

  1. For each of the data sets run all modes except binary search using amount_of_data in increments of 1500 up to 15000 measuring the running time using the method described in assignment 3, the hints section in this assignment describes a way to automate this process. You may choose to measure with larger data sets, but be aware that for some of the algorithms this will take a long time. For each of the data sets plot the results and submit an electronic copy of the plot (.pdf, .ps or .xls). Analyze and comment on the plots. Include estimates of the what the experiment tells about the asymptotic running times of the sorts for each of the data sets.

  2. Why did we ask you to implement a mode that reads in the file and outputs it but does not sort the data? (update your plots to account for this)

  3. Often the known structure of data can help you improve sort performance. For the first two data sets name two distinct ways to improve sort performance based on what you know about the data.

  4. Comment on the difference between the results in the third and fourth data sets.

  5. Compare the performance of bubble sort on your list and ArrayList, were there differences? What do you think accounts for these differences?

  6. Compare the two versions of quick sort on all test sets.

  7. Which algorithm do you think Arrays.sort() uses to sort data? Why?

  8. If you had used LinkedList instead of ArrayList how would your results have changed?

Commenting and Style

Handin

Please hand all necessary files into your handin directory in a subdirectory named Sorts. Your application class should be called Sorts.java. There should be exactly one class declared within each .java file. If your program does anything strange (bugs), awesome(extra features) or has a non-intuitive interface please include a file called README.txt which explain them. If there are bugs in your program but you do not describe them in your README you will lose more credit than if you had described them.

Hints