Read an Excerpt
Chapter 1: Overview of OLAP and Data Warehousing in SQL Server
If you are new to data warehousing and on-line analytical processing, you must be asking yourself the inevitable question, Just what are OLAP and Data Warehousing, and why the big fuss about them? This chapter introduces you to the OLAP and data warehousing concepts in general and to the Microsoft contribution to this field in particular.If you are a seasoned data warehouse professional, you may want to skip the introductory information in this chapter and move to Appendix B, which discusses the installation process of the Microsoft OLAP services, or to Chapter 2, which discusses the Microsoft OLAP architecture. However, if you are new to the data-warehousing world, or would like to read a refresher about it, then this is the chapter for you.
This chapter contains an overview of:
- What is On-Line Analytical Processing (BEAR), what are its benefits, and who will benefit from it most?
- What is data warehousing, and how does it differ from OAP and operational databases?
- Microsoft SQL Server 7.0 new OLAP and data warehousing capabilities.
- The architecture of the Microsoft OLAP service including the data collection, transformation, analysis and presentation tools.
Let's answer the first question that probably comes to mind about OLAP - what is OLAP, and what are its benefits?
Why use Data Warehouses?
Relational databases have gained great popularity since the mid 1980's. Many corporations adopted these databases for their mission-critical data needs. Many database vendors supported relational databases and produced state-of-the-art productsthat would make using such databases easy, efficient, and practical.
As a result, many corporations migrated their data to relational databases, which were mainly used in areas where transactions are needed, such as operation and control activities. An example would be a bank using a relational database to control the daily operations of customers transferring, withdrawing, or depositing funds in their accounts. The unique properties of relational databases, with referential integrity, good fault recovery, support for a large number of small transactions, etc. contributed to their widespread use.
The concept of data warehouses began to rise as organizations found it necessary to use the data they were collecting through their operational systems for future planning and decision-making. Assuming they use the operational systems, they had to build queries that summarized the data and fed management reports. Such queries, however, would be extremely slow because they usually summarize large amounts of data, sharing the database engine with every day operations, which in turn adversely affected the performance of operational systems. The solution was, therefore, to separate the data used for reporting and decision making from the operational systems. Hence, data warehouses were designed and built to house this kind of data so that it can be used later in the strategic planning of the enterprise.
Relational database vendors marketed their databases as tools for building data warehouses. Therefore, you see companies like Oracle, Sybase, Informix, and IBM among the leading companies in offering data warehouse capabilities in their databases. Data accumulated in a data warehouse is used to produce informational reports that answer questions like "Who?" and "What?" about the original data. For instance, in the bank example above, a data warehouse can be used to answer a question like "Which branch yielded the maximum profits for the third quarter of this fiscal year?" Or it could be used to answer a question like "What was the net profit for the third quarter of this fiscal year per region?"
OLAP and Data Warehouses
While data warehouses are usually based on relational technology, OLAP uses a multidimensional view of aggregate data to provide quick access to important information for further analysis. OLAP enables analysts, managers, and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information. OLAP transforms raw data to useful information so that it reflects the real factors affecting or enhancing the line of business of the enterprise.
The basic advantage of OLAP systems is that they can be used to study different scenarios by asking the question "What if?". If we take the bank example, a sample question would be, "What if the bank charges an extra $1.OO.for every automatic teller machine (ATM) transaction. done by a user who is not a current bank customer? How would that affect the bank revenue?". This unique feature makes OLAP a great decision making tool that could help determine the best courses of action for the company's business. OLAP and data warehouses complement each other. The data warehouse stores and manages the data, while OLAP converts the stored data into useful information. OLAP techniques may range from simple navigation and browsing of the data (often referred to as slicing and dicing), to more serious analyses, such as time-series and complex modeling.
The timeline of business data in a data warehouse/OLAP system is presented in the figure below. The figure shows that raw data is collected, stored and managed into a data warehouse, and OLAP converts the data to information. Advanced OLAP analyses and other tools, such as data mining (explained in detail in Chapter 11), can further convert the information into powerful knowledge.
Dr. Codd, the inventor of relational databases and one of the most prominent database researchers, first coined the term OLAP in a white paper entitled "Providing OLAP to User Analysis: An IT Mandate" which he published in 1993. The white paper defined 12 rules for OLAP applications. Nigel Pendse and Richard Creeth of the OLAP Report (http://www.olapreport.com/DatabaseExplosion.htm) simplified the definition of OLAP applications as applications that should deliver fast analysis of shared multidimensional information (FASMI). This statement means:
- Fast: The user of these applications is an interactive user who expects the delivery of the information they require at a fairly constant rate. Most queries should be delivered to the user in five seconds or less.
- Analysis: OLAP applications should performs basic numerical and statistical analysis of the data. These calculations could be pre-defined by the application developer, or defined by the user as ad hoc queries.
- Shared: The data delivered by OLAP applications should be shared across a large user population. This stresses the need for implementation of the security requirements necessary for keeping the data confidential and safe.
- Multidimensional: OLAP applications are based on multidimensional databases, which is an essential characteristic of OLAP.
- Information: OLAP applications should be able to access all the data and information necessary and relevant for the application. The data may be located in different sources and be large in volume.
ComputerWorld magazine reported in February 1998, that Office Depot, one of the largest office equipment suppliers in the US, significantly improved its sales due to the improved on-line analytical processing (OLAP) tools it used in its different stores. This result came at a time when the financial markets expected Office Depot's sales to drop after the failure of the merger with Staples, a competitor to Office Depot, due to the loss of many people whose job was to conduct such analyses. ComputerWorld reported that the improved OLAP tools used by Office Depot helped increase sales a respectable 4% for the second half of 1997. For example, Office Depot found that it was carrying too much fringe stock in the wrong stores. Therefore, the retail stores narrowed their assortment of PCs from 22 to 12 products. That helped the company eliminate unnecessary inventory and avoid costly markdowns on equipment that was only gathering dust...