DAWN'16
Workshop on Database Aspects Explored by Wisconsin's New DB Researchers 

December 12, 2016 from 8:00AM-9:30AM, and

December 14, 2016 from 8:00AM-9:30AM


in 1325 CS
Madison, WI


Dawn at the Memorial Union Terrace. Photo by: Jeff Miller, UW-Madison University Communications

 

December 12, 2016           8:00-9:30AM, 1325 CS

8:00-8:12 AM

Evaluation of Relational JSON document store (ARGO) on Quickstep and comparison with Postgres and MongoDB

Subasree Venkatsubhramaniyen and Rashi Jalan

Argo is a mapping layer between JSON objects and relational database systems. This project is an extension of the same idea where we study the system on top of Quickstep and compare the performance of NoBench queries on Postgres, MongoDB and Quickstep. We study the effect of various Quickstep parameters, namely data storage formats (row/column), indices, block size and compression on the NoBench queries.

8:12-8:24 AM

Evaluation of Main Memory Sort Merge Join Algorithms for Multi-core CPUs

Deepanker Aggarwal and Siddharth Suresh

Due to the recent hardware trends of increasing main memory capacities and massive multi-core processing, new algorithms such as MPSM have been developed for the Sort-Merge Join operator. This project entails the comparison of two recent such sort-based join algorithms for multi-socket multi-core machines, by implementing the algorithms and also evaluating them against several data sets.

8:24-8:36 AM

Energy Efficient Query Processing in SQLite3

Atasi Panda and Praneeth Naramsetti

Rising energy costs, an avalanche of mobile applications and the ever-increasing need to have longer battery life have made energy-aware query processing the need of the hour. Being an open source database and the most popular relational database to be used in mobile devices, makes SQLite3 the perfect candidate for our experiments. Our goal is to make the SQLite3 more energy-efficient rather than just performance-efficient by analyzing the energy profiles of different queries, which can further be used to make energy aware plans.

8:36-8:48 AM

LSM vs traditional index based approach to reduce write amplification on SSDs

Pallavi Maheshwara Kakunje  and Sowrabha Horatti Gopal

Investigating if traditional approach for organizing data in the database, like the B-tree with few enhancements can provide reduced write amplification compared to modern LSM-based systems like RocksDB. We study the performance of enhanced B-tree implementations (such as B-w trees and FD trees) and LSM implementations (such as RocksDB) under various workloads and analyze the factors contributing to write amplification.

8:48-9:00 AM

Spark v/s Heron: Comparison of Stream Processing Systems

Rohit Damkondwar and Pradeep Ramaswamy

Our focus lies in comparing the performance of streaming applications in Spark and Heron on various use-cases like word frequency count, AI algorithms and other generic computations that use stream processing. Our aim is to build a benchmark test suite to compare these two systems as there is no existing comparison results or test suite.  Our initial results show that Heron outperforms Spark by almost 2X to 5X in throughput performance.

9:00-9:12 AM

Interactive Exploration of Window Sizes on Sorted Neighborhood Blocking

Varun Naik

When merging information from various sources, a data scientist might use sorted neighborhood blocking with a specified window size to eliminate obvious mismatches between two input tables. However, a user may want to interactively explore the candidate matches if she chooses different window sizes. This project implements “what-if” analysis by storing intermediate data, given a maximum window size. This project also explores tradeoffs between writing to a filesystem and writing to a database.

9:12-9:24 AM

Cost based model for View Materialization

Shaleen Deep

Data Warehousing systems frequently evaluate a fixed query workload against a relational database. A well known method to enhance query performance in such environments is to make use of materialized views for faster execution. However, with a variety of cloud providers selling a heterogeneous set of machines for different prices, it is unclear how to choose which parts to materialize and if it can help in faster execution. In this work, we explore this area to systematically decide what views to materialize for optimizing the cost of query execution versus cost of view materialization.


 

December 14, 2016           8:00-9:30AM, 1325 CS

8:00-8:12 AM

Natural Language Interface to Databases

Nivetha Singara Vadivelu and Neha Godwal

This project deals with implementing Natural Language Interface (NLI) to database queries in order to make it user friendly and accessible to many people. We use a mix of two approaches - natural language processing and machine learning to achieve our goals.

8:12-8:24 AM

Effect of Storage Format on Stochastic Gradient Descent and Stochastic Coordinate Descent

Yusong Yang and Qun Zou

Stochastic Gradient Descent (SGD) and Stochastic Coordinate Descent (SCD) are widely used in machine learning. In this project, a simple storage manager is built and the above two algorithms are implemented on top of it to examine the effect of storage format on the performance of SGD and SCD.

8:24-8:36 AM

Effects of Density and Dimensionality on HogWild Style Gradient Descent

Marc Spehlmann and Matt Christie

HogWild! can give near-linear speedup on some types of problems, but falls short in many instances. In our effort to better understand the limits of HogWild, we systematically evaluate a space of problems and establish a model for approximating the speedup over this space.

8:36-8:48 AM

Performance Evaluation of Lock-based and Latch-free Hash table

Zhen Zhang and Wei Li

This project implements two latch-free hash table data structures and compares the CPU scalability with a lock-based hash table, bucket-chain hash table and cuckoo hash table. Three basic operations on hash tables are implemented for these hash tables. This project evaluates the scalability performance of hash tables in different scenarios of usage/access patterns.

8:48-9:00 AM

Graph Visualization Recommendation System

Priyanka Nayek and Ainura Ainabekova

It is problematic to analyze tabular data that is the output of a SQL query. Tables often do not provide intuitive ways to understand the trends in the data. Therefore, SQL analysts use data visualization techniques to analyze the output of SQL queries. In this project our goal is to implement a recommendation system that produces a ranking of possible visualizations of the query output. There are several domains that can be considered, and we work on graphs and text domains. This is a complicated research problem that can be investigated in several ways. We approach the problem by using heuristics and analyzing the results of SQL queries (i.e. tables) based on the number of attributes, the types of the attributes, and the data distribution of the attributes.

9:00-9:12 AM

Visualization of Machine Learned Model Performance and Explanations

Weijing Tang and Erika Lee

With the proliferation of machine learning libraries and services making predictive analytics accessible in large enterprise and “grassroots” settings, a well-designed machine learning pipeline is necessary to help the humans in the loop keep track of and digest different models. We have implemented, in the form of a QlikView dashboard, two components of such pipeline: one targeted at data scientists to help them compare different models; one aimed at data consumers to help them understand what the machine has learned.

9:12-9:24 AM

Graph Query Interface in Relational Database

Tianrun Li

Graph data and algorithm are widely used in applications like social network, citation network. This project builds a graph query interface on relational database to support various kinds of query, and does many experiments to evaluate its performance and scalability on large datasets, compared to some specialized graph computation engine like Neo4J and GraphLab.

If you are looking for the CS 764 course home page, click here.