Building and Maintaining a Data Warehouse

Overview

As it is with building a house, most of the work necessary to build a data warehouse is neither visible nor obvious when looking at the completed product. While it may be easy to plan for a data warehouse that incorporates all the right concepts, taking the steps needed to create a warehouse that is as functional and user-friendly as it is theoretically sound, is not especially easy. That’s the challenge that Building and Maintaininga Data Warehouse answers.

Based on a ...

See more details below
Hardcover
$86.91
BN.com price
(Save 12%)$98.95 List Price
Other sellers (Hardcover)
  • All (6) from $56.96   
  • New (3) from $85.69   
  • Used (3) from $56.96   
Sending request ...

Overview

As it is with building a house, most of the work necessary to build a data warehouse is neither visible nor obvious when looking at the completed product. While it may be easy to plan for a data warehouse that incorporates all the right concepts, taking the steps needed to create a warehouse that is as functional and user-friendly as it is theoretically sound, is not especially easy. That’s the challenge that Building and Maintaininga Data Warehouse answers.

Based on a foundation of industry-accepted principles, this work provides an easy-to-follow approach that is cohesive and holistic. By offering the perspective of a successful data warehouse, as well as that of a failed one, this workdetails those factors that must be accomplished and those that are best avoided.

Organized to logically progress from more general to specific information, this valuable guide:

  • Presents areas of a data warehouse individually and in sequence, showing how each piece becomes a working part of the whole
  • Examines the concepts and principles that are at the foundation of every successful data warehouse
  • Explains how to recognize and attend to problematic gaps in an established data warehouse
  • Provides the big picture perspective that planners and executives require

Those considering the planning and creation of a data warehouse, as well as those who’ve already built one will profit greatly from the insights garnered by the author during his years of creating and gathering information on state-of-the-art data warehouses that are accessible, convenient, and reliable.

Read More Show Less

Product Details

  • ISBN-13: 9781420064629
  • Publisher: Taylor & Francis
  • Publication date: 3/7/2008
  • Pages: 328
  • Product dimensions: 6.20 (w) x 9.40 (h) x 0.90 (d)

Table of Contents

The Big Picture: An Introduction to Data Warehousing
Decision Support Systems; Dimensional and Third Normal Form Data Models; Storing the Data; Data Availability; Monitoring Data Quality.
Data Warehouse Philosophy
Enterprise Data; Subject Orientation; Data Integration; Form; Function; Grain; Nonvolatility; Time Variant; One Version of the Truth; Long-Term Investment.
Source System Analysis
Source System Analysis Principles; System of Record; Entity Data Arithmetic Data: Absolute, Relative, Numeric Data That Isn’t Arithmetic; Alphanumeric Data; Granularity; Latency; Transaction Data; Snapshot Data; Source System Analysis Methods; Data Profile; Data Flow Diagram; Data State Diagram; System of Record; Business Rules.
Relational Database Management System (RDBMS)
Relational Set Theory; RDBMS Product Offerings; Residual Costs; Licensing; Support and Maintenance; Extensibility; Connective Capacity.
Database Design
Data Modeling Methodology; Conceptual and Logical Data Models; Logical (Primary) Key; Attribute; Primary Key/Foreign Key Relation; Cardinality; Super Types and Subtypes; Physical Data Model; Dimensional Data Model; Third Normal Form Data Model; Recursive Data Model; Physical Data Model Summary; Data Architecture; Enterprise Data Warehouse; Data Mart; Operational Data Store; Summaries and Aggregates.
Data Acquisition and Integration
Source System and Target System Analysis; Direct and Indirect Requirements; Language; Data Profile; Data State; Data Mapping; Business Rules; Architecture; Extract, Transform, and Load (ETL); Extract, Load, and Transform (ELT); ETL Design and Process Principles.
Eleven Principles; Staging Principles Conclusion; ETL Functions; Extract Data from a Contiguous Dataset and from a Data Flow; Row-Level and Dataset-Level Transformation.
Surrogate Key Generation: Intradataset, Data Warehouse-Level Transformation, Intra-Data Warehouse, Changed Data Capture, ETL Key, Universe to Universe and Candidate to Universe, Load Data from a Stable and Contiguous Dataset, Load Data from a Data Flow.
Transaction Summary; Dimension Aggregation.
Common Problems: Source Data Anomalies, Incomplete, Redundant, and Misstated Source Data; Business Rule Changes, Obsolete, Redefined, and Unrecorded Data.
Business Intelligence (BI) Reporting
Success Factors; Performance; User Interface; Presentation of the Data Architecture; Alignment with the Data Model; Ability to Answer Questions; Mobility; Flexibility; Availability; Customer Success Factors.
Processes: Proactive, Reactive, Predefined, Ad Hoc; Data, Information, and Analytic Needs; BI Reporting Application and Architecture.
BI Reporting Methods: Predefined and Interactive Reports, Online Analytical Process (OLAP) Reports, MOLAP, ROLAP, HOLAP; Drilling; Push versus Pull; Printed on Paper; Report Archives; Web-Based BI Reporting; Operational BI Reporting: From an ODS, From an Operational System (Real-Time), EDI, Partnerships, and Data Sharing.
BI Reporting: Customer Relationship Management (CRM), Business Metrics Measure the Enterprise, Decisions and Decision Making Closer to the Action; Reporting around the Event; BI Search; Sarbanes–Oxley and BI Reporting; Data Mining; Statistics Concepts; Random Error; Statistical Significance.
Variables: Dependent and Independent.
Hypothesis; Data Mining Tools and Activities; Data Cleansing; Data Inspection; Compound, Lag, Numeric, and Categorical Variables; Hypothesis; Data Mining Algorithms; Neural Network; Decision Tree; CHAID; Nearest Neighbor; Rule Induction; Genetic Algorithm; Rule Validation and Testing; Overfitting.
Data Quality
Deming’s Definition of Quality; Data Quality Service Level Agreement (SLA); Deming’s Statistical Process Control; Process Measurement; Methods and Strategies; Data Stewardship; Post-Load Audit and Report Errant Data.
Plug in a Default Value and Report Errant Data; Reject a Record and Report the Errant Record; Reject a Dataset and Report the Errant Dataset.
Recycle the Data: In Place and Report Errant Data, Recycle Wheel and Report Errant Data, Data Quality Repository; Data Quality Fact Table: Dimensional Data Model, Third Normal Form Data Model; Data Quality Reporting.
Metadata
Types of Metadata; Static and Dynamic Metadata; Metadata Service Level Agreement (SLA); Metadata Repository; Central Metadata Repository: Dimensional Data Model; Third Normal Form; Distributed Metadata Repository; Real-Time Metadata; Data Quality as Metadata; Make or Buy a Metadata Repository.
Data Warehouse Customers
Strategic Decision Makers; Tactical Decision Makers; Knowledge Workers; Operational Applications; External Partners; Electronic Data Interchange (EDI) Partners; Data Warehouse Plan.
Future of Data Warehousing: An Epilogue
Scalability and Performance; Real-Time Data Warehousing; Increased Corporate Presence; Back to the Basics; Data Quality.
Short TOC
The Big Picture: An Introduction to Data Warehousing
Data Warehouse Philosophy
Source System Analysis
Relational Database Management System (RDBMS)
Database Design
Data Acquisition and Integration
Business Intelligence Reporting
Data Quality
Metadata
Data Warehouse Customers
Future of Data Warehousing: An Epilogue
Bibliography
Index
Toc to post to abstract
Preface
Acknowledgments
The Author
Introduction
The Big Picture: An Introduction to Data Warehousing
Introduction
Decision Support Systems
Dimensional and Third Normal Form Data Models
Storing the Data
Data Availability
Monitoring Data Quality
Data Warehouse Philosophy
Introduction
Enterprise Data
Subject Orientation
Data Integration
Form
Function
Grain
Nonvolatility
Time Variant
One Version of the Truth
Long-Term Investment
References
Source System Analysis
Introduction
Source System Analysis Principles
System of Record
Entity Data
Arithmetic Data
Absolute Arithmetic Data
Relative Arithmetic Data
Numeric Data That Isn’t Arithmetic
Alphanumeric Data
Granularity
Latency
Transaction Data
Snapshot Data
Source System Analysis Methods
Data Profile
Data Flow Diagram
Data State Diagram
System of Record
Business Rules
Closing Remarks
References
Relational Database Management System (RDBMS)
Introduction
Relational Set Theory
RDBMS Product Offerings
Residual Costs
Licensing
Support and Maintenance
Extensibility
Connective Capacity
Closing Remarks
References
Database Design
Introduction
Data Modeling Methodology
Conceptual Data Model
Logical Data Model
Logical (Primary) Key
Attribute
Primary Key/Foreign Key Relation
Cardinality
Super Types and Subtypes
Putting It All Together
Physical Data Model
Dimensional Data Model
Third Normal Form Data Model
Recursive Data Model
Physical Data Model Summary
Data Architecture
Enterprise Data Warehouse
Data Mart
Operational Data Store
Summaries and Aggregates
Closing Remarks
References
Data Acquisition and Integration
Introduction
Source System Analysis
Target System Analysis
Direct Requirements
Indirect Requirements
Direct and Indirect Requirements
Language
Data Profile
Data State
Data Mapping
Business Rules
Architecture
Extract, Transform, and Load (ETL)
Extract, Load, and Transform (ELT)
ETL Design Principles
ETL Process Principles
Principle 01: One Thing at a Time
Principle 02: Know When to Begin
Principle 03: Know When to End
Principle 04: Large to Medium to Small
Principle 05: Stage Data Integrity
Principle 06: Know What You Have
Process Principles Conclusion
ETL Staging Principles
Principle 07: Name the Data
Principle 08: Own the Data
Principle 09: Build the Data
Principle 10: Type the Data
Principle 11: Land the Data
Staging Principles Conclusion
ETL Functions
Extract Data from a Contiguous Dataset
Extract Data from a Data Flow
Row-Level Transformation
Dataset-Level Transformation
Surrogate Key Generation: Intradataset
Data Warehouse-Level Transformation
Surrogate Key Generation: Intra-Data Warehouse
Look-Up
Changed Data Capture
ETL Key
Universe to Universe and Candidate to Universe
Load Data from a Stable and Contiguous Dataset
Load Data from a Data Flow
Transaction Summary
Dimension Aggregation
Common Problems
Source Data Anomalies
Incomplete Source Data
Redundant Source Data
Misstated Source Data
Business Rule Changes
Obsolete Data
Redefined Data
Unrecorded Data
Closing Remarks
References
Business Intelligence Reporting
Introduction
BI Reporting Success Factors
Performance
User Interface
Presentation of the Data Architecture
Alignment with the Data Model
Ability to Answer Questions
Mobility
Flexibility
Availability
BI Customer Success Factors
Proactive Processes
Reactive Processes
Predefined Processes
Ad Hoc Processes
Data Needs.
Information Needs
Analytic Needs
BI Reporting Application
Architecture
BI Reporting Methods.
Predefined Reports
Interactive Reports
Online Analytical Process (OLAP) Reports
MOLAP
ROLAP
HOLAP
Drilling
Push versus Pull
Push
Pull
Printed on Paper
Report Archives
Web-Based BI Reporting
Operational BI Reporting: From an ODS
Operational BI Reporting: From an Operational System (Real-Time)
Operational BI Reporting: EDI, Partnerships, and Data Sharing.
BI Reporting: Thus Far.
Customer Relationship Management (CRM)
Business Metrics Measure the Enterprise
Decisions and Decision Making Closer to the Action
BI Reporting: Coming Soon
Reporting around the Event
BI Search
Sarbanes–Oxley and BI Reporting
Data Mining
Statistics Concepts
Random Error
Statistical Significance
Variables: Dependent and Independent
Hypothesis
Data Mining Tools
Data Mining Activities
Data Cleansing
Data Inspection
Compound Variables
Lag Variables
Numeric Variables
Categorical Variables
Hypothesis
Data Mining Algorithms
Neural Network
Decision Tree
CHAID
Nearest Neighbor
Rule Induction
Genetic Algorithm
Rule Validation and Testing
Overfitting
Closing Remarks
References
Data Quality
Introduction
Deming’s Definition of Quality
Data Quality Service Level Agreement (SLA)
Deming’s Statistical Process Control
Process Measurement
Methods and Strategies
Data Stewardship
Post-Load Audit and Report Errant Data
Plug in a Default Value and Report Errant Data
Reject a Record and Report the Errant Record
Reject a Dataset and Report the Errant Dataset
Recycle the Data: In Place and Report Errant Data
Recycle the Data: Recycle Wheel and Report Errant Data
Data Quality Repository
Data Quality Fact Table: Dimensional Data Model
Data Quality Fact Table: Third Normal Form Data Model
Data Quality Reporting
Follow Through
Closing Remarks
References
Metadata
Introduction
Types of Metadata
Static Metadata
Dynamic Metadata
Metadata Service Level Agreement (SLA)
Metadata Repository
Central Metadata Repository: Dimensional Data Model
Central Metadata Repository: Third Normal Form
Distributed Metadata Repository
Real-Time Metadata
Data Quality as Metadata
Make or Buy a Metadata Repository
Closing Remarks
References
Data Warehouse Customers
Introduction
Strategic Decision Makers
Tactical Decision Makers
Knowledge Workers
Operational Applications
External Partners
Electronic Data Interchange (EDI) Partners
Data Warehouse Plan
Strategic Decision Makers
Tactical Decision Makers
Knowledge Workers
Operational Applications
External Partners
Electronic Data Interchange (EDI) Partners
Closing Remarks
Future of Data Warehousing: An Epilogue
Introduction
Scalability and Performance
Real-Time Data Warehousing
Increased Corporate Presence
Back to the Basics
Data Quality
Bibliography
Index

Read More Show Less

Customer Reviews

Be the first to write a review
( 0 )
Rating Distribution

5 Star

(0)

4 Star

(0)

3 Star

(0)

2 Star

(0)

1 Star

(0)

    If you find inappropriate content, please report it to Barnes & Noble
    Why is this product inappropriate?
    Comments (optional)