In this project, we are giving you more modules of the Minirel DBMS. You will have the following modules already implemented:
Since this version of Minirel that we are releasing for you is more complete and complicated than previous versions, some of the interfaces that you dealt with in previous two assignemnts have changed. We suggest that you take a look and study the following interfaces before implementing the project:
In Minibase, a relation is implemented as a heap file, which is a collection of records. Records can be inserted or deleted from a heap file, and each record is uniquely identified by a record id. A scan is the interface used to access records in a heap file, one by one.
An index provides fast access to records in the heap file, and currently Minibase only supports B+-indices. Each entry in the index is a (key, record id) pair. Entries can be inserted or deleted from an index. An index search scan provides interface for accessing the records in the index. You will not need the index access method, unless you are implementing index nested loop join.
You should begin by reading the chapter on Implementation of Relational Operations, in particular, the section on Block Nested Loop Join and Sort-Merge Join
In SMJoin and BlockJoin classes, the constructor joins two relations R and S, represented by the heapfiles filename1 and filename2, respectively, using the sort-merge/block nested loop join algorithm. Note that the columns for relation R(S) are numbered from 0 to len_in1 - 1 (len_in2 - 1). You are to concatenate each matching pair of records and write it into the heapfile filename3. The error layer for these classes is JOINS as defined in new_error.h
In order to implement the Sort-Merge join, you will need to use the following classes which are given to you: Sort, HeapFile, and Scan. You will call the Sort constructor to sort the input heapfiles (which means your primary responsibility will be to implement the merging phase of the algorithm). To compare the join columns of two tuples, you will call the function tupleCmp (declared in sort.h). Once a scan is opened on a heapfile, the scan cursor can be positioned to any record within the heapfile calling the Scan method position with an RID argument. The next call to the Scan method getNext will proceed from the new cursor position.
To implement the block nested loop join method, you need to study the algorithm for this method in your textbook. Since Minibase does not provide page-by-page access into a relation stored in a heap file, you will simulate a "block" with an array storing the records. The join function will take as a parameter, an integer, block_size, of the outer relation. To simulate a block of pages, you need to copy block_size number of tuples each time and save them in an array, then use that array as a block of tuples. You are not required to implement the double buffering techniques or use hash tables.
Augment the buffer manager class given to you to collect statistics. Specifically, you should be able to tell how many PinPage requests have been made and how many of those miss. It is suggested that you write two functions to reset and report the statistics.
You can use the clock/time functions to determine the running time of your join algorithm. Refer to the C++ help for more details on time. A simple example can be found in main.C on how to use clock function.
You should let your algorithm run for a few times (the longer the better) and report the average running time. Avoid running other CPU or I/O intensive processes (Email client, browser, etc.) while collecting statistics since you are reports actual time rather than CPU time. Print out average running times and the number of misses for every join algorithm you implement. You may need to change the test files and create larger relations with many tuples to be able to analyze the difference between the various join algorithms.
In your report, you need to determine whether it is beneficial to sort the file for the purpose of performing a single join operation by comparing the cost of this join with that of other join method(s).
Important: It is strongly recommended that you finish the implementation of this homework at least 3-4 days before the due date. It will take you a few days to be able to run good experiments and generate a good report. Obviously this is only for people who want to get extra credits by generating a good analysis report document.