Dawn at the Memorial Union Terrace.
Photo by: Jeff Miller, UW-Madison University Communications
December 12, 2016 8:00-9:30AM, 1325 CS
|
|
8:00-8:12 AM |
Evaluation of Relational
JSON document store (ARGO) on Quickstep and comparison with Postgres and
MongoDB Subasree Venkatsubhramaniyen
and Rashi Jalan Argo is a mapping layer between JSON objects and
relational database systems. This project is an extension of the same idea
where we study the system on top of Quickstep and compare the performance of NoBench queries on Postgres, MongoDB and Quickstep. We
study the effect of various Quickstep parameters, namely data storage formats
(row/column), indices, block size and compression on the NoBench
queries. |
8:12-8:24 AM |
Evaluation of Main Memory Sort Merge
Join Algorithms for Multi-core CPUs Deepanker Aggarwal and Siddharth Suresh Due to
the recent hardware trends of increasing main memory capacities and massive
multi-core processing, new algorithms such as MPSM have been developed for
the Sort-Merge Join operator. This project entails the comparison of two
recent such sort-based join algorithms for multi-socket multi-core machines,
by implementing the algorithms and also evaluating them against several data
sets. |
8:24-8:36 AM |
Energy Efficient Query
Processing in SQLite3 Atasi Panda and Praneeth
Naramsetti Rising energy costs, an avalanche of mobile
applications and the ever-increasing need to have longer battery life have
made energy-aware query processing the need of the hour. Being an open source
database and the most popular relational database to be used in mobile
devices, makes SQLite3 the perfect candidate for our experiments. Our goal is
to make the SQLite3 more energy-efficient rather than just performance-efficient
by analyzing the energy profiles of different queries, which can further be
used to make energy aware plans. |
8:36-8:48 AM |
LSM vs traditional index based
approach to reduce write amplification on SSDs Pallavi Maheshwara
Kakunje and Sowrabha
Horatti Gopal Investigating
if traditional approach for organizing data in the database, like the B-tree with
few enhancements can provide reduced write amplification compared to modern
LSM-based systems like RocksDB. We study the
performance of enhanced B-tree implementations (such as B-w trees and FD
trees) and LSM implementations (such as RocksDB)
under various workloads and analyze the factors contributing to write
amplification. |
8:48-9:00 AM |
Spark v/s Heron:
Comparison of Stream Processing Systems Rohit Damkondwar
and Pradeep Ramaswamy Our focus lies in comparing the performance of streaming
applications in Spark and Heron on various use-cases like word frequency
count, AI algorithms and other generic computations that use stream
processing. Our aim is to build a benchmark test suite to compare these two
systems as there is no existing comparison results or test suite. Our initial results show that Heron
outperforms Spark by almost 2X to 5X in throughput performance. |
9:00-9:12 AM |
Interactive Exploration of Window
Sizes on Sorted Neighborhood Blocking Varun Naik When
merging information from various sources, a data scientist might use sorted
neighborhood blocking with a specified window size to eliminate obvious
mismatches between two input tables. However, a user may want to
interactively explore the candidate matches if she chooses different window
sizes. This project implements “what-if” analysis by storing intermediate
data, given a maximum window size. This project also explores tradeoffs
between writing to a filesystem and writing to a database. |
9:12-9:24 AM |
Cost based model for
View Materialization Shaleen Deep Data Warehousing systems frequently evaluate a
fixed query workload against a relational database. A well known method to enhance query performance in
such environments is to make use of materialized views for faster execution.
However, with a variety of cloud providers selling a heterogeneous set of
machines for different prices, it is unclear how to choose which parts to
materialize and if it can help in faster execution. In this work, we explore
this area to systematically decide what views to materialize for optimizing
the cost of query execution versus cost of view materialization. |
December 14, 2016 8:00-9:30AM, 1325 CS
|
|
8:00-8:12 AM |
Natural Language
Interface to Databases Nivetha Singara Vadivelu and Neha Godwal This project deals with implementing Natural
Language Interface (NLI) to database queries in order to make it user
friendly and accessible to many people. We use a mix of two approaches -
natural language processing and machine learning to achieve our goals. |
8:12-8:24 AM |
Effect of Storage Format on
Stochastic Gradient Descent and Stochastic Coordinate Descent Yusong Yang and Qun Zou Stochastic
Gradient Descent (SGD) and Stochastic Coordinate Descent (SCD) are widely
used in machine learning. In this project, a simple storage manager is built
and the above two algorithms are implemented on top of it to examine the
effect of storage format on the performance of SGD and SCD. |
8:24-8:36 AM |
Effects of Density and
Dimensionality on HogWild Style Gradient Descent Marc Spehlmann and Matt Christie HogWild! can give near-linear speedup on some types of
problems, but falls short in many instances. In our effort to better
understand the limits of HogWild, we systematically
evaluate a space of problems and establish a model for approximating the
speedup over this space. |
8:36-8:48 AM |
Performance Evaluation of Lock-based
and Latch-free Hash table Zhen Zhang and Wei Li This
project implements two latch-free hash table data structures and compares the
CPU scalability with a lock-based hash table, bucket-chain hash table and
cuckoo hash table. Three basic operations on hash tables are implemented for
these hash tables. This project evaluates the scalability performance of hash
tables in different scenarios of usage/access patterns. |
8:48-9:00 AM |
Graph Visualization
Recommendation System Priyanka
Nayek and Ainura Ainabekova It is problematic to analyze tabular data that is the
output of a SQL query. Tables often do not provide intuitive ways to
understand the trends in the data. Therefore, SQL analysts use data
visualization techniques to analyze the output of SQL queries. In this
project our goal is to implement a recommendation system that produces a
ranking of possible visualizations of the query output. There are several
domains that can be considered, and we work on graphs and text domains. This
is a complicated research problem that can be investigated in several ways.
We approach the problem by using heuristics and analyzing the results of SQL
queries (i.e. tables) based on the number of attributes, the types of the
attributes, and the data distribution of the attributes. |
9:00-9:12 AM |
Visualization of Machine Learned
Model Performance and Explanations Weijing Tang and Erika Lee With
the proliferation of machine learning libraries and services making
predictive analytics accessible in large enterprise and “grassroots”
settings, a well-designed machine learning pipeline is necessary to help the
humans in the loop keep track of and digest different models. We have
implemented, in the form of a QlikView dashboard,
two components of such pipeline: one targeted at data scientists to help them
compare different models; one aimed at data consumers to help them understand
what the machine has learned. |
9:12-9:24 AM |
Graph Query Interface in
Relational Database Tianrun Li Graph data and algorithm are widely used in
applications like social network, citation network. This project builds a
graph query interface on relational database to support various kinds of
query, and does many experiments to evaluate its performance and scalability
on large datasets, compared to some specialized graph computation engine like
Neo4J and GraphLab. |
If you are looking for the CS 764 course home page, click here.