class Engine {
public:
Engine();
submit(string request);
Schema* getSchema();
Tuple* getNext();
};
Every request results in a stream of tuples (which may be empty).
Tuple is a structure that is implemented as an array of ADT pointers:
class Tuple {
ADT* fields[0];
public:
void* operator[](int i){return fields[i];}
};
ADT pointers are implemented as void pointers:
typedef void ADT;
Some basic ADT are implemented as follows: integer implemented as native
C type int, real numbers are implemented as native C type double, and strings
are implemented as native C char arrays terminated by NULL character.
The application that receives the answers have to interpret them according
to the Schema given by getSchema method.
As the result of calling the getSchema method, the parser module is
invoked.
Parsing step is largely trivial and is accomplished by using standard
tools for lexical analysis (flex) and parsing (bison).
Input to the parsing module is the request string, output is the parse
tree object.
class ParseTree {
public:
virtual const Schema* getSchema();
virtual Iterator* createExec() = 0;
};
So, most abstractly, every request (not necessarily a query) results
in an Iterator that is described by a Schema. Schema is a
list of type identifier pairs. Iterator is an object that supports getNext
interface.
We have several subclasses of the parse tree, each corresponding to
the different type of request.
The main one is the QueryParseTree, that results from a SQL
query. The others are the data definition commands, such as: CreateIndexParseTree,
DropIndexParseTree, InsertParseTree, DeleteParseTree, CreateTableParseTree
and similar.
Method createExec() returns an empty Iterrator for all data
definition parse trees while it returns the query answer for the QueryParseTree.
class Iterator {
public:
virtual void initialize();
virtual const Tuple* getNext();
virtual const TypeIDList& getTypes();
};
If the query fails to parse, a parse exception is returned by the system,
otherwise, the createExec() method is invoked on the parse tree
object, which in turn calls the type checker module.
Type checking is similar for data definition request and for queries
but not identical.
Here we will only discuss the query typechecking (QueryParseTree)
Important part of the QueryParseTree class is the Query
object.
Query is internally represented by the following class:
class Query {
Expression[] selectClause;
TableAlias[] fromClause;
Expression whereClause;
};
Example of a TableAlias is .a.b.c as t1. You may notice an unconventional
specification of the FROM clause. Tables in DTE can be located within other
collections. In this example, table c is located in a collection
of tables called b, which is in turn located in the collection a.
Collections may be of different types, in DTE we have special tables
that act as collections and we call them catalogs or directories.
Typical catalogs may look like this:
Relation 1.1
a | collection | 1.2 |
x | singleton | 1.3 |
b | collection | 2.1 |
y | singleton | 1.4 |
Another important type of collection is a remote DTE server.Conceptually, servers just store a collection of relations and in that sense they are similar to directories. However, implementation wise, they are quite different. While the directories are actively traversed by the local server, all the operations that relate to a remote server are just handed over to be done autonomously by the remote server.
Consider now what happens when the system has to find table .a.b.c. It first looks at the root catalog (1.1) for the entry a. If finds that this entry exists and is a name for relation 1.2. Then it finds the entry b in 1.2 and discovers that this entry is an object on another server (number 2). At this point system forwards the request to resolve table c to the server 2. Assume that the response from the server 2 was that the relation c has an id of 4.5. Mapping of names to relation ids is not implemented yet.
At this time server 1 knows that server 4 has table t1 and that
this table is registered under relation id number 5. From this information
(server and relation ids) we construct a class that implements DataSource
interface. This interface provides methods to retrieve all the relevant
information about a particular table, including its schema.
In general, when resolving a table name, we might have to cross more
than one server boundary. If the system was naively chasing names like
this every time, the name resolution will be very inefficient. This is
an additional reason to maintain only destination server id when referencing
other tables.
Example of some classes that implement DataSource interface can be found in DataSources package .
Once the schema is known, server 1 can type check the query. Type checking is done in two steps:
Once the optimization is completed, parts of the best plan are shipped to the servers involved as directives. Nodes in the best plan are used to create the tree of iterators that are using pull mechanism to retrieve all answers. To learn more about iterators, check out the Iterator package.
Presently, DTE does not have a buffer manager, it is relying on the underlying operating system to do the buffering. We plan to incorporate the Range Buffer manager for accessing local and remote files. Range buffer manager will not affect the optimization of the queries, but will speed up the execution.
The only join method implemented is the nested loops join. Inner table
is loaded in virtual memory (not in buffer pool) and we do not keep track
of the amount of memory allocated.
We have an implementation of Gestalts. The only queries possible on
a gestalt at the present time are single table queries (selections and
aggregates).
Selection is being pushed down to the Gestalt members, aggregates are
not (they should be too).