I was a graduate student from August 1994 to May 2000 working towards
my Ph.D in databases with my advisor Jeffrey Naughton. I completed in July 2001.
My research interests spans across various areas of computer science including

I worked with Jeffrey Naughton in the BORG data warehouse project and
with David Dewitt in Paradise project.



Efficient Storage and Query Processing of Set-Valued Attributes


In order to better support complex applications, object relational systems provide features
that are absent in relational systems. The main distinguishing features are abstract data types
and type constructors. The set constructor creates a new type by providing collections of
existing types. It is well known that sets are useful in modeling a great deal of real world data.
However, such powerful modelling comes at a price; without an efficient implementation, using
sets can yield a performance much worse than that obtained using only traditional relational
constructs. This dissertation explores novel ways of implementing set-valued attributes in an object
relational system. Specifically, it considers various options for efficiently storing set-valued
attributes, and ways of computing the challenging set containment join operation.

We first address the problem of storing set-valued attributes. Using the orthogonal attributes
of nesting and location we identify four options for representing sets: nested internal and external,
and unnested external and internal. These representations can be combined with the creation of
various indices to create various classes of indexed representations. We evaluate each of these
representations with respect to conjunctive and disjunctive queries. Our results show that overall
the nested implementations perform better than the unnested implementations because Next we consider the problem of efficiently evaluating set containment joins. For unnested external
representation, the set containment join can be expressed directly in SQL. By contrast, the most
obvious algorithm for computing set containment joins on nested representations is the signature
nested loops algorithm, which computes set signatures and compares each signature in a relation with
all the signatures in the other relation. To improve on the performance of this algorithm we propose
a new partitioned set join algorithm (PSJ), which uses a multi-level scheme of partitioning by
replicating the inner relation. Our performance study shows that for extremely small relation and small
set cardinalities, the SQL query approach and signature nested loops perform comparably to PSJ.
However, as the size of the data sets increase (in both relation and set cardinality), PSJ clearly dominates.



Data Warehouse

Object Relational Database Systems

Selected Professional Activities

Technical Program Committee


Source Code

Contact Information

Electronic mail address