Managing Large-Scale Probabilistic Databases.

Want to read this on your NOOK? Request as NOOK Book from the publisher

Thank you for requesting this book as a NOOK book from the publisher.

More About This Book

Overview
Product Details

Overview

Modern applications are driven by data, and increasingly the data driving these applications are imprecise. The set of applications that generate imprecise data is diverse: In sensor database applications, the goal is to measure some aspect of the physical world (such as temperature in a region or a person's location). Such an application has no choice but to deal with imprecision, as measuring the physical world is inherently imprecise. In data integration, consider two databases that refer to the same set of real-world entities, but the way in which they refer to those entities is slightly different. For example, one database may contain an entity 'J. Smith' while the second database refers to 'John Smith'. In such a scenario, the large size of the data makes it too costly to manually reconcile all references in the two databases. To lower the cost of integration, state-of-the-art approaches allow the data to be imprecise. In addition to applications which are forced to cope with imprecision, emerging data-driven applications, such as large-scale information extraction, natively produce and manipulate similarity scores. In all these domains, the current state-of-the-art approach is to allow the data to be imprecise and to shift the burden of coping with imprecision to applications. The thesis of this work is that it is possible to effectively manage large, imprecise databases using a generic approach based on probability theory. The key technical challenge in building such a general-purpose approach is performance, and the technical contributions of this dissertation are techniques for efficient evaluation over probabilistic databases. In particular, we demonstrate that it is possible to run complex SQL queries on tens of gigabytes of probabilistic data with performance that is comparable to a standard relational database engine.

Product Details

ISBN-13: 9781244106246
Publisher: BiblioLabsII
Publication date: 9/12/2011
Pages: 212
Product dimensions: 7.44 (w) x 9.69 (h) x 0.45 (d)

Customer Reviews

Be the first to write a review

( 0 )

5 Star

(0)

4 Star

(0)

3 Star

(0)

2 Star

(0)

1 Star

(0)

If you find inappropriate content, please report it to Barnes & Noble