
Picnic Point at Dawn. Photo by: Jeff
Miller, UW-Madison University Communications
December 12, 2012. Talks are in room CS 2310. |
|
|
11:00-11:15 |
Genome-Health
Risk Prediction of Residual Feed Intake in Dairy Cattle Considering
health history information improved the predictive accuracy of genomic
evaluation in residual feed intake (RFI) in dairy cattle. The random forests
(RF) algorithm showed more advantages in incorporation this extra information
than GBLUP model, which may be benefited from its ability to utilize complex
interactions. Within the 11 health traits being considered, birth body
weight, scour in the calf period, and having twin calves showed significant
effects on RFI. Further study about the interactions between these health
traits and RFI will be needed |
|
11:15-11:30 |
Complex
Event Processing Complex
Event Processing (CEP) is an emerging area where streams of incoming data are
examined to fi |
|
11:30-11:45 |
Mitigating
Skew in MapReduce: Black Boxes are Evil The
user-defined partitioning and grouping functions in MapReduce close the door
to good reduce-side skew mitigation. Existing skew handling algorithms fail
to guarantee correctness in presence of these black boxes. The transparency
of user-defined reduce functions make it impossible to distribute a large, or
computationally-expensive, group over more than one reducer, resulting in
skewed execution. Our proposal is to expose the grouping to enable workload
distribution optimizations, and develop a skew mitigation approach with
theoretical performance guarantees, which is particularly optimized for
join-style reduce tasks. Preliminary
results show an order-of-magnitude speedups for skewed joins in our Hadoop
prototype. |
|
11:45-noon |
An
Evaluation of Lucene and PostgreSQL Full Text Search Engine Within
the background of “big data”, full text searching is a critical tool that
provides the capability to identify natural language documents that satisfy a
user query. Although most traditional DBMS have built-in full text search
engine, the Apache Lucene, which is an independent information retrieval
library, gains more popularity for indexing and searching large datasets. In
this project, we propose a benchmark integrating index building time, index
storage and query speed to compare the performance of Apache Lucene and
PostgreSQL full text search engine. Then, we move to investigate which
software performs better considering various indexing scenarios, such as
indexing with generalized inverted index (GIN) and generalized search tree
(GiST) for PostgreSQL, and searching scenarios, such as varying the size of
datasets. |
|
Noon-12:15 |
The
Impact of Multi-Tenancy in Desktop and Mobile Systems using LevelDB and
SQLite3
Interactive web applications have increased in sophistication and now need browsers to offer efficient client-side storage. The push to standardize the web via HTML5 has lead to the IndexedDB API fulfilling web applications' local storage needs. Due to the vast number of webpages users visit at any given time, browsers must provide an implementation of IndexedDB that performs well in a multi-tenancy environment. My goal is to benchmark and compare the performance trends of SQLite3 and LevelDB as the level of multi-tenancy increase to Internet scale on a desktop and a mobile device. |
December 14, 2012. Talks are in room CS 2310. |
|
|
11:00-11:15 |
SQL and NoSQL comparison on Interactive Data-Serving
Environments Junyan Chen In this new era of “big
data”, traditional RDBMSs are no longer the only viable alternative for
data-driven applications. NoSQL systems act as a strong competitor to
traditional RDBMSs in terms of interactive data-serving environments and
analytical decision support systems workload processing. A recent study
compared the performance of NoSQL and SQL database for OLTP and DW workloads.
In this project, we aim to expend the study to include systems not covered in
that study and we only focus on NoSQL systems for interactive data-serving environments.
Particularly, we compare SQL server and Cassandra, a popular NoSQL system
designed by Facebook, using the YCSB benchmark to characterize how these
systems compare on interactive data-serving environments. |
|
11:15-11:30 |
Optimizing image storage Joy Arulraj Efficient image storage and
retrieval mechanisms are crucial to improve end-user experience with minimal
cost in modern storage systems. Our focus is primarily on image objects, in
particular those encoded in JPEG format. We intend to tile the JPEG images,
identify similar image tiles and then reduce storage cost by leveraging these
data patterns. The trade-off between storage benefits and associated image
retrieval latency is also evaluated. We have observed 5-10% savings in
overall storage cost using our current image storage mechanism. |
|
11:30-11:45 |
Comparison of Clustering Algorithms for Large Data
Sets Evan Samanas and Halit Erdogan We present an empirical
comparison of large-scale clustering algorithms. We run experiments on both
real and synthetic data sets and investigate the performance of the
algorithms in terms of speed-up and scale-up. We also examine the quality of
the clusterings that are produced by the algorithms using available quality
metrics. Finally, we present interesting applications that show how
clustering cloud be important and useful in some large-scale applications. |
|
11:45-noon |
A Rule-Based Stand Alone Query Optimizer for Main
Memory Systems Brian Sullivan Changing database systems
with different storage methods and performance properties has provided a
challenge to query optimization. A customizable query optimizer that made no
assumptions about how the database was configured would have an advantage in
this changing landscape. Additionally, systems having larger amounts of main
memory will benefit query optimizers that can intelligently use this extra
memory. This project implements a top-down query optimization using a rules
engine to optimize queries and output the query plan in a standardized format
as well as providing a caching strategy for speeding up optimization between
multiple queries. |
|
Noon-12:15 |
Evaluating Database Sort on Modern GPUs Jason Power In the past few years,
graphics processing units (GPUs) have become much more programmable with
improved languages and faster drivers. Previous work evaluated using GPUs for
sorting within database management systems and found they were competitive
with CPUs. However, in the most recent entries to the sort benchmark the CPU
reigned king. By applying modern GPU algorithms and programming environments,
we show that GPUs are once again competitive, and in fact can outperform, CPU
sorting algorithms. |
If you are looking for the CS 764 course home page, click here.