Beautiful Data: The Stories Behind Elegant Data Solutions [NOOK Book]

Overview

In this insightful book, you'll learn from the best data practitioners in the field just how wide-ranging -- and beautiful -- working with data can be. Join 39 contributors as they explain how they developed simple and elegant solutions on projects ranging from the Mars lander to a Radiohead video.

With Beautiful Data, you will:

  • Explore ...
See more details below
Beautiful Data: The Stories Behind Elegant Data Solutions

Available on NOOK devices and apps  
  • NOOK Devices
  • Samsung Galaxy Tab 4 NOOK
  • NOOK HD/HD+ Tablet
  • NOOK
  • NOOK Color
  • NOOK Tablet
  • Tablet/Phone
  • NOOK for Windows 8 Tablet
  • NOOK for iOS
  • NOOK for Android
  • NOOK Kids for iPad
  • PC/Mac
  • NOOK for Windows 8
  • NOOK for PC
  • NOOK for Mac
  • NOOK for Web

Want a NOOK? Explore Now

NOOK Book (eBook)
$19.99
BN.com price
(Save 44%)$35.99 List Price

Overview

In this insightful book, you'll learn from the best data practitioners in the field just how wide-ranging -- and beautiful -- working with data can be. Join 39 contributors as they explain how they developed simple and elegant solutions on projects ranging from the Mars lander to a Radiohead video.

With Beautiful Data, you will:

  • Explore the opportunities and challenges involved in working with the vast number of datasets made available by the Web
  • Learn how to visualize trends in urban crime, using maps and data mashups
  • Discover the challenges of designing a data processing system that works within the constraints of space travel
  • Learn how crowdsourcing and transparency have combined to advance the state of drug research
  • Understand how new data can automatically trigger alerts when it matches or overlaps pre-existing data
  • Learn about the massive infrastructure required to create, capture, and process DNA data

That's only small sample of what you'll find in Beautiful Data. For anyone who handles data, this is a truly fascinating book. Contributors include:

  • Nathan Yau
  • Jonathan Follett and Matt Holm
  • J.M. Hughes
  • Raghu Ramakrishnan, Brian Cooper, and Utkarsh Srivastava
  • Jeff Hammerbacher
  • Jason Dykes and Jo Wood
  • Jeff Jonas and Lisa Sokol
  • Jud Valeski
  • Alon Halevy and Jayant Madhavan
  • Aaron Koblin with Valdean Klump
  • Michal Migurski
  • Jeff Heer
  • Coco Krumme
  • Peter Norvig
  • Matt Wood and Ben Blackburne
  • Jean-Claude Bradley, Rajarshi Guha, Andrew Lang, Pierre Lindenbaum, Cameron Neylon, Antony Williams, and Egon Willighagen
  • Lukas Biewald and Brendan O'Connor
  • Hadley Wickham, Deborah Swayne, and David Poole
  • Andrew Gelman, Jonathan P. Kastellec, and Yair Ghitza
  • Toby Segaran
Read More Show Less

Product Details

  • ISBN-13: 9781449379292
  • Publisher: O'Reilly Media, Incorporated
  • Publication date: 7/14/2009
  • Sold by: Barnes & Noble
  • Format: eBook
  • Edition number: 1
  • Pages: 386
  • Sales rank: 1201758
  • File size: 17 MB
  • Note: This product may take a few minutes to download.

Meet the Author

Toby Segaran is the author of Programming Collective Intelligence, a very popular O'Reilly title. He was the founder of Incellico, a biotech software company later acquired by Genstruct. He currently holds the title of Data Magnate at Metaweb Technologies and is a frequent speaker at technology conferences.

Jeff Hammerbacher is the Vice President of Products and Chief Scientist at Cloudera. Jeff was an Entrepreneur in Residence at Accel Partners immediately prior to joining Cloudera. Before Accel, he conceived, built, and led the Data team at Facebook. The Data team was responsible for driving many of the statistics and machine learning applications at Facebook, as well as building out the infrastructure to support these tasks for massive data sets. The team produced several academic papers and two open source projects: Hive, a system for offline analysis built above Hadoop, and Cassandra, a structured storage system on a P2P network. Before joining Facebook, Jeff was a quantitative analyst on Wall Street. Jeff earned his Bachelor's Degree in Mathematics from Harvard University.

Read More Show Less

Table of Contents

Preface xi

1 Seeing Your Life in Data Nathan Yau 1

Personal Environmental Impact Report (PEIR) 2

your.flowingdata (YFD) 3

Personal Data Collection 3

Data Storage 5

Data Processing 6

Data Visualization 7

The Point 14

How to Participate 15

2 The Beautiful People: Keeping Users in Mind When Designing Data Collection Methods Jonathan Follett Matthew Holm 17

Introduction: User Empathy Is the New Black 17

The Project: Surveying Customers About a New Luxury Product 19

Specific Challenges to Data Collection 19

Designing Our Solution 21

Results and Reflection 31

3 Embedded Image Data Processing on Mars J. M. Hughes 35

Abstract 35

Introduction 35

Some Background 37

To Pack or Not to Pack 40

The Three Tasks 42

Slotting the Images 43

Passing the Image: Communication Among the Three Tasks 46

Getting the Picture: Image Download and Processing 48

Image Compression 50

Downlink, or, It's All Downhill from Here 52

Conclusion 52

4 Cloud Storage Design in a Pnutshell Brian F. Cooper Raghu Ramakrishnan Utkarsh Srivastava 55

Introduction 55

Updating Data 57

Complex Queries 64

Comparison with Other Systems 68

Conclusion 71

5 Information Platforms and the Rise of the Data Scientist Jeff Hammerbacher 73

Libraries and Brains 73

Facebook Becomes Self-Aware 74

A Business Intelligence System 75

The Death and Rebirth of a Data Warehouse 77

Beyond the Data Warehouse 78

The Cheetah and the Elephant 79

The Unreasonable Effectiveness of Data 80

New Tools and Applied Research 81

MAD Skills and Cosmos 82

Information Platforms As Dataspaces 83

The Data Scientist 83

Conclusion 84

6 The Geographic Beauty of aPhotographic Archive Jason Dykes Jo Wood 85

Beauty in Data: Geograph 86

Visualization, Beauty, and Treemaps 89

A Geographic Perspective on Geograph Term Use 91

Beauty in Discovery 98

Reflection and Conclusion 101

7 Data Finds Data Jeff Jonas Lisa Sokol 105

Introduction 105

The Benefits of Just-in-Time Discovery 106

Corruption at the Roulette Wheel 107

Enterprise Discoverability 111

Federated Search Ain't All That 111

Directories: Priceless 113

Relevance: What Matters and to Whom? 115

Components and Special Considerations 115

Privacy Considerations 118

Conclusion 118

8 Portable Data In Real Time Jud Valeski 119

Introduction 119

The State of the Art 120

Social Data Normalization 128

Conclusion: Mediation via Gnip 131

9 Surfacing the Deep Web Alon Halevy Jayant Madhaven 133

What Is the Deep Web? 133

Alternatives to Offering Deep-Web Access 135

Conclusion and Future Work 147

10 Building Radiohead's House of Cards Aaron Koblin Valdean Klump 149

How It All Started 149

The Data Capture Equipment 150

The Advantages of Two Data Capture Systems 154

The Data 154

Capturing the Data, aka "The Shoot" 155

Processing the Data 160

Post-Processing the Data 160

Launching the Video 161

Conclusion 164

11 Visualizing Urban Data Michal Migurski 167

Introduction 167

Background 168

Cracking the Nut 169

Making It Public 174

Revisiting 178

Conclusion 181

12 The design of sense.us Jeffrey Heer 183

Visualization and Social Data Analysis 184

Data 186

Visualization 188

Collaboration 194

Voyagers and Voyeurs 199

Conclusion 203

13 What Data Doesn't do Coco Krumme 205

When Doesn't Data Drive? 208

Conclusion 217

14 Natural Language Corpus Data Peter Norvig 219

Word Segmentation 221

Secret Codes 228

Spelling Correction 234

Other Tasks 239

Discussion and Conclusion 240

15 Life in Data: The Story of DNA Matt Wood Ben Blackburne 243

DNA As a Data Store 243

DNA As a Data Source 250

Fighting the Data Deluge 253

The Future of DNA 257

16 Beautifying Data in the Real World Jean-Claude Bradley Rajarshi Guha Andrew Lang Pierre Lindenbaum Cameron Neylon Antony Williams Egon Willighagen 259

The Problem with Real Data 259

Providing the Raw Data Back to the Notebook 260

Validating Crowdsourced Data 262

Representing the Data Online 263

Closing the Loop: Visualizations to Suggest New Experiments 271

New Experiments 271

Building a Data Web from Open Data and Free Services 274

17 Superficial Data Analysis: Exploring Millions of Social Stereotypes Brendan O'Connor Lukas Biewald 279

Introduction 279

Preprocessing the Data 280

Exploring the Data 282

Age, Attractiveness, and Gender 285

Looking at Tags 290

Which Words Are Gendered? 294

Clustering 295

Conclusion 300

18 Bay Area Blues: The Effect of the Housing Crisis Hadley Wickham Deborah F. Swayne David Poole 303

Introduction 303

How Did We Get the Data? 304

Geocoding 305

Data Checking 305

Analysis 306

The Influence of Inflation 307

The Rich Get Richer and the Poor Get Poorer 308

Geographic Differences 311

Census Information 314

Exploring San Francisco 318

Conclusion 319

19 Beautiful Political Data Andrew Gelman Jonathan P. Kastellec Yair Ghitza 323

Example 1: Redistricting and Partisan Bias 324

Example 2: Time Series of Estimates 326

Example 3: Age and Voting 328

Example 4: Public Opinion and Senate Voting on Supreme Court Nominees 328

Example 5: Localized Partisanship in Pennsylvania 330

Conclusion 332

20 Connecting Data Toby Segaran 335

What Public Data Is There, Really? 336

The Possibilities of Connected Data 337

Within Companies 338

Impediments to Connecting Data 339

Possible Solutions 343

Conclusion 348

Contributors 349

Index 357

Read More Show Less

Customer Reviews

Be the first to write a review
( 0 )
Rating Distribution

5 Star

(0)

4 Star

(0)

3 Star

(0)

2 Star

(0)

1 Star

(0)

    If you find inappropriate content, please report it to Barnes & Noble
    Why is this product inappropriate?
    Comments (optional)