Project 1a: Sorting Records by ColumnYou will write a simple sorting program. This program should be invoked as follows: shell% ./varsort -i inputfile -o outputfile [-c column] The above line means the users typed in the name of the sorting program
Input files are generated by a program we give you called genvar.c . After running The file begins with a header. This header is a single integer, R containing the number of records in this file. After the header, the file contains
iiii<3>DDDDDDDDDDDDAgain, the index is four bytes and this record contains three integers, or 3 * sizeof(integer) = 3 * 4 = 12 bytes. Note that different records can have indices with different amounts of associated data. Your goal: to build a sorting program called For instance, given records as follows
index 0: data_ints: 3 rec: 55 33 77 index 1: data_ints: 4 rec: 35 73 75 53 index 2: data_ints: 2 rec: 73 37 Sorting the records by column 0, we should have
index 1: data_ints: 4 rec: 35 73 75 53 index 0: data_ints: 3 rec: 55 33 77 index 2: data_ints: 2 rec: 73 37 Furthermore, if we sort the records by column 2, we should get
index 2: data_ints: 2 rec: 73 37 index 1: data_ints: 4 rec: 35 73 75 53 index 0: data_ints: 3 rec: 55 33 77 Because record in index 2 does not have column 2, we then take the last data of this record to sort. Some DetailsUsing shell% gcc -o genvar genvar.c -Wall -Werror Note: you will also need the header file sort.h to compile this program. Then you run it: shell% ./genvar -s 0 -n 100 -m 32 -o /tmp/outfile There are four flags to The format of the file generated by the Another useful tool is dumpvar.c . This program can be used to
dump the contents of a file generated by For example, if run as follows: thenshell% ./dumpvar -i /tmp/file dumpvar will take the binary data in /tmp/file and display the indices and records in a human-readable ASCII format.
When you create files to sort, we strongly recommend that you place
those files in You will probably want to the look at the source code for both of these utilities to see how to read and write to files; in particular, they could be useful for seeing how to understand the variable-sized record format. A common header file sort.h has the detailed description. There are three different versions of the record that you might find useful in different circumstances:
Note that you may NOT simply allocate an array of HintsIn your sorting program, you should just use To sort the data, use any sort that you'd like to use. An easy way
to go is to use the library routine The routine Remember to write out the header (i.e., the number of records) for
your sorted output. For efficiency, you might not want to call To exit, call If you don't know how to use these functions, use the man pages! To do some preliminary testing of your implementation
of In addition to testing for correct program behavior, we will also be giving points for good programming style and for careful memory management. As programmers, we often won't be writing all our own code from scratch, and instead will be making contributions inside of an already existing project which other people are also contributing to. In these situations, other programmers will often need to be able to read and understand the code you've written. Because of this, many companies will require its programmers to adhere to a style guideline, so that the task of reading and understanding another person's code is made easier.
For grading on style, we will mostly stick to the style guidelines
which Google uses for C++ (more info can be found
here)
with a few differences specific to C. These differences are in the
config file in
~cs537-2/ta/lint/cpplint.py We will be running it with the following options:
cpplint.py --extensions=c,h varsort.c Another important skill to develop in C programming is good memory management. This means freeing any heap space you've allocated when you're done using it! Since memory is a limited resource, we will want to free memory when we're done with it so that it can be reused. To check that your code doesn't contain memory leaks, we will be using a tool known as valgrind. It's a simple tool which moniters every call to malloc (and other memory allocation functions) and free to make sure that all memory allocated to our program is subsequently released when we are done with it. Our tests will be running valgrind on your code in the following way: valgrind --show-reachable=yes ./varsort -i infile -o outfile [-c column] Assumptions and Errors
32-bit integer data: You may assume that the data of the records are unsigned 32-bit integers. Ties in sort: We will not test how you handle the ties in sorting, i.e. it's fine if your sort is unstable. Data size: You may assume that there are at least one item and no more than USHRT_MAX data items in each record. However, most records may have many fewer data items than the max, so don't allocate this much memory for every record! Record size: You should be able to handle the file with 0 record, i.e., only a 0 in the file. File length: May be pretty long! However, there is no need to implement a fancy two-pass sort or anything like that. Moreover, the file will NOT be empty and will always have a header. Invalid files: If the user specifies an input or output file that you
cannot open (for whatever reason), the sort should EXACTLY print: Non-negative integer column: You may assume the column in the command argument is guaranteed to be a valid integer and you can use Default column: If the column argument is not provided, you should use the default value of 0. Sorting column may exceed the number of data in some records: If the specified sorting column exceeds the number of data in some record, you should just use the last column of that record to sort. Too few or many arguments passed to program: If the user runs varsort
without enough arguments, or in some other way passes incorrect flags and such to
varsort, print Important: On any error code, you should print the error to the screen
using
Your grade will primarily depend only on the correctness of your program. However, programs that run signficantly slower than others (i.e., an order of magnitude slower) will be penalized. TestingTesting is critical. Testing your code to make sure it works
is crucial. Write tests to see if your code handles all the cases you
think it should. Be as comprehensive as you can be. After you think you have covered all the edge cases, feel free to test your code with our grading script. To run the script, go to the directory where your
General AdviceStart small, and get things working incrementally. For example, first get a program that simply reads in the input file, one line at a time, and prints out what it reads in. Then, slowly add features and test them as you go. Don't worry about performance until you have all of the functionality working correctly. Keep old versions around. Keep copies of older versions of your program
around, as you may introduce bugs and not be able to easily undo them. A
simple way to do this is to keep copies around, by explicitly making copies of
the file at various points during development. For example, let's say you get
a simple version of |