DAWN'10: Workshop on Database Aspects Explored by Wisconsin's New DB Researchers

December 7, 2010. Time: noon-2:30PM
Madison, WI


Arboretum. Photo by: Jeff Miller, © UW-Madison University Communications

Program

Lunch will be served before the talks in CS 3331 from 11:30-noon. Sorry the lunch is only for the students in CS 764, but the general public is welcome to attend the talks.

December 7, 2010. Talks in room CS 2310.
noon-12:20

Architecture-Conscious Relational Joins on Multicore Processors
Venkatesh Karthik Srinivasan and Venkatanathan Varadarajan

The equi-join is a frequently used relational operator in database queries. The usage of multicore processors for commercial database servers has been constantly increasing. Recent research has led to the development of parallel implementations of simple two-way hash join and sort merge join algorithms. In this research project, we critically analyze the state of the art parallel two way join algorithms in terms of effective exploitation of several architectural features of multicore processors. Our study mainly focuses on a recently proposed Parallel Sort-Merge join algorithm, rethinking and validating the various design choices made in the algorithm. With the lessons learnt, we aim to refine the parallel two-way sort merge. We also propose efficient parallel implementations of multi-way joins for multicore processors.

12:20-12:40

Performance Evaluation and Improvement for HTML5 Web SQL Databases
Craig Chasseur and Emily Kawaler

The emerging HTML5 standard allows rich web applications to access a full SQL database embedded in the web browser. We present WWWisconsin, an in-browser database benchmark based on the classic Wisconsin benchmark, and use it to evaluate the performance of browser-embedded databases for a variety of common operations. We compare the performance of browsers such as Chrome, Safari, and Opera which implement HTML5 SQL storage (all using the SQLite embedded database engine) and also compare performance with the basic SQLite binary to show what overhead the browser adds. We uncover severe performance deficiencies for non-indexed joins in SQLite, and show how join performance can be vastly improved by migrating to a new, tuned version of SQLite.

12:40-1:00

A Study of In-memory and Persistent Storage in HTML5
Ishani Ahuja and Subhadip Ghosh

“WebStorage” is a new feature of the HTML 5 specification which provides storage on the client side. This storage can be used by web applications to store information locally which can be later used by these HTML 5 Applications using javascript. However, compared to traditional implementation through cookies, the new webstorage is much more powerful, complete, secure and can store larger amount of data at the client end. HTML5 webstorage thus gives a lot of power to the browser and ease the ability to develop applications now on HTML5 instead of creating a binary. WebStorage is provided in three flavors: i) in-memory persistent LocalStorage ii) SessionStorage and iii) DatabaseStorage. In our project,we study various aspects of the LocalStorage architecture to make them faster and better: the Hashing algorithms used, HashTable implementation, memory limitations.We also compare LocalStorage and Database Storage performance on various parameters.

1:00-1:20

HTML5 Web Storage: Evaluation and Application
Ce Zhang, Xiaoming Shi and Qian Wan

Very recently, W3C proposes the universal standard of HTML5, in which it defines three kinds of storage models 1) local storage; 2) session storage and 3) database storage. These web storage objects will be created, maintained and destroyed by web browser during the browsing process. These web storage objects stores key-value pairs in main memory and can be accessed by APIs defined in HTML5 standard. Different kinds of objects has different API and life circle. Session storage will be lost after a window is closed, while local storage and database storage will be flushed to disk and can be accessed next time the same domain is visited.

In this presentation, we will show some results of performance evaluation of web storage implementations. In addition, since this standard is not widely used right now, we will then propose a possible application that uses the local storage as a client side cache, and then compare the performance of the specific application we proposed with the widely used Memcache.

1:20-1:40

Benchmarking Collaborative Filtering Based Recommender Systems
Sanjib Das

There is a growing trend among the database researchers to push statistical techniques into relational database management systems. The researchers are now trying to integrate machine learning methodologies and database technologies to make knowledge discovery from large datasets efficient. The goal of this project is to build a scalable collaborative filtering based recommender system. Prior art has mostly focussed on the quality of the recommendations almost neglecting the performance. This project tries to understand the bottlenecks in such a design and apply database and main memory processing techniques to improve the performance.

1:40-2:00

Page Layout and the Impact on Cache Misses and Database Performance
Andrew Bender, Greig Hazell and Emma Turetsky

The way data is organized within a page in memory can have a noticeable affect on query performance. Many people have looked at the correlation between page layout and performance with respect to both main memory databases and read-mostly databases. For read-mostly databases, the slotted record layout on a page can be inefficient as it incurs many cache misses due to unnecessarily reading superfluous data. We explored multiple page layout schemes in order to determine the cost/benefits of each.

2:00-2:20

Re-evaluating the Trade-offs between ARIES and Shadowing with Flash Storage
Heemoon Chae and Tyler Harter

Modern databases use write-ahead logging (WAL) protocols such as ARIES to ensure the atomicity and durability of transactions. Although pure WAL systems have high I/O overheads, WAL systems are typically preferred over the alternative, shadowing, because pure shadowing systems result in scattered data and poor concurrency properties. When flash storage is used, scattered data is no longer a problem, so in the future, concurrency may be the only disadvantage of shadowing. Therefore, we propose a hybrid model that uses a mix of shadowing and logging that allows full concurrency while reducing the I/O load of many workload patterns. We have created a prototype to evaluate our model using both platter disks and solid state disks.

If you are looking for the CS 764 course home page, click here.