Enterprise Data Warehouse: Planning, Building, and Implementation
by Eric SperleyView All Available Formats & Editions
Finally, specifics! This isn't a "theory" book: it's an in-the-trenches, step-by-step guide to deploying data warehouses that align tightly with your business objectives. From Joint Application Development (JAD) techniques that maximize bang for the buck, to choosing the best hardware, software, and end-user access components, Eric Sperley delivers field-tested… See more details below
Overview
Finally, specifics! This isn't a "theory" book: it's an in-the-trenches, step-by-step guide to deploying data warehouses that align tightly with your business objectives. From Joint Application Development (JAD) techniques that maximize bang for the buck, to choosing the best hardware, software, and end-user access components, Eric Sperley delivers field-tested techniques you can rely on. Sperley delivers a practical, business-focused methodology that's flexible enough for any enterprise - and so detailed it'll never leave you wondering what to do next. If your data warehouse must deliver sustainable competitive advantage, don't settle for anything less.
Editorial Reviews
Product Details
- ISBN-13:
- 9780139058455
- Publisher:
- Prentice Hall Professional Technical Reference
- Publication date:
- 04/16/1999
- Series:
- Hewlett-Packard Professional Books Series
- Pages:
- 354
- Product dimensions:
- 7.28(w) x 9.55(h) x 1.17(d)
Read an Excerpt
Preface: Preface
The primary goal of MIS managers and CIOs is bringing their IT organizations into alignment with their businesses. However, most MIS managers and CIOs are technically trained and are not skilled in the art of organizational strategy. This is combined with the fact that legacy systems are not organized in a way that facilitates easy integration of data from different systems to provide new information. Thus it is difficult to align IT with the business by modifying operational systems to provide new information. The IT professional ends up in a situation where the need for change is known, but the person does not know how to select a strategy or implement changes with the current technology. This is analogous to being in the water with a shark--knowing the danger is there but not knowing where it is or how to escape it.
The two volumes of this work have been written to help the reader acquire the knowledge necessary to use data warehousing and open systems to align the IT department with the goals of the business. Although most other data warehousing texts have been high-level explanations of the advantages and goals of data warehousing, this book will lead you through the details of planning, designing, building, and using a data warehouse.
Data warehousing, decision support systems, and executive information systems have been discussed in conference and lunch rooms for many years. Yet many professionals still have no clear understanding of what a data warehouse is. For those who know data warehouse basics, success in building a data warehouse has been elusive. Practitioners of data warehousing who have been successful at a fewprojects have discovered that the warehouses they have built do not work together. In other cases, the techniques used to build small data warehouses do not work when adapted to building large ones. In the end, they have continued the problem present in legacy systems that do not scale and of data islands that are difficult to integrate.
It is the goal of the author to convey a methodology that will enable the IT professional to escape the shark. This book will provide a descriptive strategy to assist the IT professional in the planning, design, and construction of an enterprise-wide data warehouse. To understand how the IT community got in this situation, the development and history of business information technology are reviewed in Chapter 1. Once the current challenges and opportunities are understood, the oppositional characteristics of a data warehouse and operational systems can be examined and evaluated. An easily understood methodology for building a data warehouse based on sound business principles and RAD techniques is introduced and expanded in the following chapters.
In Chapter 2, several ways of characterizing the present position and marching direction of a business or IT organization are introduced. Although a single book on data warehousing will not transform the reader into an expert in business strategy, we can look at ways the business strategy can be stated and understood and a matching IT strategy selected. The most important techniques that we as IT professionals can learn are the use of business executive interviews and joint application development sessions to reveal the gap between where the business is and where the executives want it to be. Finally, we look at ways to justify the costs associated with building a data warehouse.
If you got on an airplane for a commercial flight and the pilot told you that he knew how to steer the plane but did not know how to navigate the plane or where he would land, you would probably get off the aircraft. Planning is critical to both aviation and data warehousing. In Chapter 3, we discuss some ways to architect the data warehouse so that we know how much it will cost to build what we have planned and what we are going to deliver for the cost. This will prevent us from being like Columbus, who set off not knowing where he was going, when he got there did not know where he was, and did it all on borrowed money. Chapter 3 is actually an overview of the rest of the information contained in both volumes.
Selecting the data warehousing project that will have the greatest organizational impact and success is the focus of Chapter 4. The JAD techniques discussed in Chapter 2 are used to discover the project with the largest benefit to the business and what the scope of this project should be. The primary goal of a data warehouse is to deliver information to the business knowledge workers. Since the data is organized in a meaningful way and presented in a business context, it will be the key to a successful warehouse. Chapter 5 focuses on principles and guidelines for data architecture and data modeling. By the end of this chapter, a novice data modeler should understand the basic components of enterprise and decision support data models, and an experienced data modeler will have a better understanding of how to expand known skills into new areas.
Understanding the data that is in a data warehouse is a cornerstone to the success of a data warehouse. A primary cause of failure of a data warehouse project is misunderstanding about the data in the data warehouse. Data about the data in a data warehouse is called metadata. Successful data warehouse projects are connected with successful metadata repositories. In Chapter 6 we present the value of and reasons and methods for the construction of a metadata repository.
The second primary reason for failure of data warehousing projects is lack of good-quality data in the warehouse. Chapter 7 focuses on a methodology for achieving high-quality data in the warehouse. Without understanding the value of high-quality data, it is difficult for management to invest the resources necessary to achieve such data. Thus the chapter starts with an example calculation of the cost of errors in data in the data warehouse. A method for achieving data quality is then described.
Understanding the principles of a field study enables the student of that field to apply the principles to solve new problems. In Chapter 8, we study the principles of data warehouse architecture. These are applied to construct a conceptual data architecture. The conceptual data architecture model is then applied to build a logical data warehouse.
Chapter 9 is dedicated to an understanding of physical data warehouse. The roles, trade-offs, and compromises of the different components of the physical data warehouse are analyzed.
Software that glues the warehouse tiers together and enables its construction is the subject of Chapter 10. Data extraction, transformation, and cleansing software tools are very important to the construction of a data warehouse. The important characteristics of these tools are examined in this chapter.
Once the data warehouse has been built, the data warehouse customer must be given the appropriate tools to access the data in the warehouse. Chapter 11 introduces the different types of access tools and gives the knowledge needed to enable the reader to confidently select the appropriate tools.
Finally, data mining is explained in Chapter 12. There are several different methods of data mining. All the major methods of data mining are explained in this chapter along with the advantages and disadvantages of each method.
As a general statement, this book is for the IT professional who is interested in building or understanding decision support systems. Specifically, CIOs, IT managers, data analysts, data base administrators, designers, and developers should find this book interesting and useful. CIOs and IT managers will find Chapters 1 through 4 particularly useful. Managers, data analysts, database administrators, designers, and developers should find Chapters 5 through 12 helpful for actual implementation of the warehouse, while Chapters 1 through 4 will aid them in understanding the path of their management.
Another potential audience for this book consists of management information systems, business, and computer science students. I have spent nearly five years teaching and found it exciting and easy to include in the book information that makes it a great textbook on data warehousing. Many chapters have a section of questions at the end for the use of both formal and informal students. Most chapters propose projects that the reader can pursue either as a thought exercise or as a physical project.
Acknowledgments
Several people and organizations have provided information, ideas, debate, and criticism that have contributed to this book; in many ways, they are all co-authors of the book. First, I would like to thank Stanford University professor John G. Linvill for teaching me that clarity and depth of thought, organization, expression, and presentation are more important than any other cerebral endeavor. I have attempted to live up to his example with this book. Alan Camburn was responsible for connecting me to the HP data warehouse delivery team. Alan is a world-class data architect. The following list credits only a few of those who have contributed to this work directly or indirectly. I apologize to those who I have omitted. While the fine people listed below have made many contributions, I alone assume responsibility for errors and omissions of content. I have organized the contributors by organization, and they are listed in no particular order.
The HP Open Warehouse Advanced Technology Group: Pam Munsch, Bruce Jenks, Jim Meyerson, and Glen Kalina.
The HP World Wide Open Warehouse Group: Fran Ioppolo, Cecilia Bolomo, Cecilia Campbell, Roger Eberline, Hal McMillan, and Mike Overly.
The HP TIMBU Data Warehouse Team: Bob Meyer, Alan Camburn, Randall Etheridge, Maya Milster, and Sharon Swaney.
The Tandy Information Services Group: Dick Silvers, Bill Koenig, John Hilton, and Steve McWhorter.
The HP Retail Team: Ray Kelly, Jim Woods, Larry Kohutek, and Terrance Daily.
A special thanks goes to the guy who helped initially to format the book: Jeff Sperley.
Many thanks to Barbara Zeiders for the hundreds of suggestions that helped the readability of the book.
Customer Reviews
Average Review: