Assignment 1: Data-Oriented C++ Programming
(Due Date: Thursday 9/15/94)
Introduction
The purpose of this assignment is to get you acquainted with C++ and Unix
in order to prepare you for the course project.
Specifically, it will illustrate classes, inheritance, abstract base classes,
(pure) virtual methods, pass-by-reference, (operator) overloading, the iostream
library, new/delete and other C++ features.
If you are not experienced in C++
coding, this assignment should get you acquainted with some useful
techniques, and more importantly, give you some indication of how
difficult (or easy!) it is going to be for you to learn C++.
Note that I will not teach C++; you are expected to know it or to
pick it up on your own.
What You Have to Do
The goal of this assignment is to write code for reading in sorted
files of fixed-length records and merging them.
To keep this assignment simple, we will assume that all records
consist of one string (with a defined maximum length) and one
integer. Input data files will be in binary format, and the string
will occur first, followed by the integer. It should be noted that
strings will have to fit in the given maximum length inclusive of the
null terminating character.
Step 1: Define a class "RecSpec" which represents the specifications
of a type of record. Note that RecSpec does not represent an actual
record.
The constructor of RecSpec accepts two arguments:
the maximum length for strings, and an indication of which field
(integer or string) will be used for sorting criterias.
A (partial) class definition of RecSpec follows:
enum BOOL { FALSE, TRUE};
class RecSpec {
public:
enum field_type { INT, STR };
RecSpec(unsigned int maxstrlen, field_type sortfield);
BOOL operator==(const RecSpec &rs) const;
int compare(const void *rec1, const void *rec2) const;
BOOL print_rec(const void *rec) const;
BOOL get_from_file(ifstream &is, void *buf) const;
unsigned int rec_size() const;
private:
const unsigned int max_str_len;
const field_type sort_field;
};
When an object of type RecSpec is declared, the private fields are
initialized by the constructor in the obvious way.
Again, note that the RecSpec object has no field to
actually store a record; it simply keeps some descriptive information about
(a collection of) records, and provides some useful methods to manipulate
such records.
field_type is used to specify (to the constructor) which field
the records will be sorted on.
operator== is used to verify that two RecSpec objects are the same,
allowing their associated records to be merged (see step 2).
Two RecSpec objects are defined to be equal if they both use the same
field for sorting, and if they have the same maximum string length.
compare compares the sort fields of the two given records.
It returns a negative value is rec1 < rec2, 0 if rec1 == rec2, and
a positive value if rec1 > rec2.
print_rec "nice-prints" a record to the standard output
(see cout).
get_from_file reads a record from an input file into a buffer. It does
not manipulate the data in any shape or form. You should use the
fstream classes to do this (see section 3 for more details).
rec_size returns the size of the records associated with this RecSpec
(you might want to think about making this an inline method).
Step 2: Define a class "Seq" (sequence) and two subclasses "FileSeq"
and "MergeSeq".
The parent class (Seq) is only used to provide an abstract interface
to both its derived classes (FileSeq and MergeSeq). It cannot be
instanciated (see below).
Sequences are basically pipes through which records are passed up, one at
a time. FileSeq reads records from a file and passes them up, sorted, to its
user (You can assume that the records in a data file are already sorted).
MergeSeq merge-sorts records from its two Seq inputs. Note that each
input for one MergeSeq object can be either FileSeq or MergeSeq, and
that a tree of Seqs can be built to merge an unlimited number of input files.
A (partial) class definition for Seq is shown below:
class Seq {
public:
Seq(const RecSpec *recinfo);
~Seq();
virtual BOOL get_next_rec(void *&recptr) = 0;
inline const RecSpec *get_rec_spec() const { return rec_info; }
protected:
const RecSpec *rec_info;
};
The constructor for Seq should initialize rec_info.
get_rec_spec should return a const pointer to the RecSpec.
get_next_rec attaches the pointer argument to a buffer containing
the next record. It returns TRUE on success and FALSE on error. If there
are no more records to be passed, get_next_rec returns TRUE, but assigns
the pointer argument to NULL.
You should check recursively down a Seq tree that all RecSpecs
are in fact equal before get_next_rec can be executed (see MergeSeq
for more information on this).
Since Seq contains ``pure'' virtual functions, note that it is illegal
to declare objects of class Seq. You can only declare objects of some
subclass of Seq.
A (partial) class definition for FileSeq is shown below:
class FileSeq : public Seq {
public:
FileSeq(const char *name, const RecSpec *recinfo);
~FileSeq();
virtual BOOL get_next_rec(void *&recptr);
private:
ifstream in_file;
char *buffer;
};
The constructor for FileSeq should open a file for reading and allocate
a buffer large enough for the record sizes it will have to handle.
The destructor will close the file and free the buffer.
get_next_rec should read the next record into the buffer
and return it as was discussed for Seq.
A (partial) class definition for MergeSeq is shown below:
class MergeSeq : public Seq {
public:
MergeSeq(Seq *s1, Seq *s2);
~MergeSeq();
virtual BOOL get_next_rec(void *&recptr);
private:
Seq *seq1;
Seq *seq2;
};
The MergeSeq class constructor accepts two Seq objects
as arguments. It should first check that the sequences are mergeable
by comparing their two RecSpecs (using the == operator).
If so, a MergeSeq object should be created with the RecSpec object returned
from the get_rec_spec method on one of the input sequences passed as the
argument to the Seq constructor.
The destructor should recursively delete its children. This way,
a Seq tree can be deleted simply by calling delete on its root.
get_next_rec should compare the two current records, using
the RecSpec::compare method,
and return the lesser. Some sort of a flag will be needed to remember
which input's current record has not yet been used (the greater record
from the last call).
Empty sequences and end-of-sequence should be handled along the lines
indicated in FileSeq.
Points to Note
It can be assumed that at any time, at most one FileSeq object refers to a given
file, and that at most one MergeSeq object (directly) refers to a given FileSeq object.
Use the fstream classes for I/O operations. (To do so, you must #include
both iostream.h and fstream.h in your header file.)
Both FileSeq and MergeSeq objects should operate in ``lazy manner'',
e.g., the input file should be read one record at a time in response to successive
calls to get_next_rec on a FileSeq object.
If you want to modify any of the class definitions shown above by
adding/removing/modifying private and/or protected members, feel free to do so.
But you must keep the public interfaces as they have been described above,
You should use C++'s memory management (new and delete).
Do not use malloc and free! Be sure to study how to use new and
delete for arrays.
A Word on Documentation, Error Checking and Style
As will be the case all semester in CS 564, good documentation is expected.
We will expect to see both external and internal (i.e., inline) documentation.
A good rule of thumb is to comment your programs as if you were going to hand
them to another programmer to finish, someone who has absolutely no idea what
you were doing when you wrote it (and cannot talk to you to find out).
You are expected to check for error conditions, and ensure that your
code including all the public methods above
terminates gracefully on all inputs, with meaningful error msgs.
While some error conditions have been explicitly discussed, e.g.,
get_next_rec on an empty file, others have not, e.g., what happens
when RecSpec::get_from_file is called on an empty file. So think carefully
about potential errors, and write defensive code.
Program Grading Criteria
For this and future programming assignments, working code alone will not
guarantee a good grade. Documentation, error checking, and programming style
will also be taken into consideration when programs are graded. The relative
importance of these categories is as follows:
- Works as specified: 65%
- Good documentation: 15%
- Good coding style: 10%
- Good error checking: 10%
What to Turn In
The TAs will put information on-line about what to do for the final
testing and running of your code. You should turn in copies of your code
together with copies of the output produced by running what they ask you
to run.
Finding the Files
Files for CS 564 will be placed in the directory /u/course/cs564-1.
Testing information, constant definitions, routine templates, and test
data files will all be placed in a subdirectory of this directory called
assign1. You should simply make yourself a copy of each of the
files that you find there. In addition, remember to send mail to the
TAs (to "cs564-1") so that they can add you to their CS 564 student mailing
list. Otherwise, you are likely to miss out on useful information about
this and future CS 564 programming assignments. Be sure to check your mail
regularly.
Due Date
Assignment #1 is due on Thursday, September 15, at 5:00 PM. Note that the
late policy for CS 564 implies that (i) lateness will lead to a loss of 5
percentage points per day and (ii) no late assignments will be accepted
at all after 5:00 PM on Sunday, September 18. The TAs will make an on-line
solution available the following Monday for those who wish to look at it.
|