Read an Excerpt
Chapter 6: Data Marts
This chapter is dedicated to data marts - the dimensional solution that holds the information key to the decision-making process. Unlike the enterprise data warehouse, the data mart supports specific processes in segments of a corporate suite of decision support systems; however, by linking data marts we can achieve a detailed view of our business and find trends that may have been previously unknown.
We first defined a data mart as a subject-oriented business view of the warehouse. It usually contains significantly smaller amounts of data than the warehouse and it is the object of analytical processing by the end user. In a corporate data warehouse solutions environment, there may be marts set up for pockets of the company, such as sales, manufacturing, and billing. Data marts enable pockets of a company's organization to make better-informed and more accurate strategic business decisions. Data marts commonly are less expensive and much smaller than a full-blown corporate-wide data warehouse. Organizations that intelligently implement a number of data marts find it provides their users with a quick introduction to warehousing. By allowing users to use and understand the power provided to them by the data mart, they can then embark on a more complete corporate warehouse. As the corporate enterprise data warehouse is built, the user community always has access to their own data marts, which evolve over time. The old saying of "If you build it, they will come," is very true during data warehouse development-by having users better understand the analyses that they can perform on their data, the more complete the warehouse win evolveinto. So, get your users involved in a data mart initiative, building toward the enterprise data warehouse.
The current industry trend is to build the enterprise data warehouse with an "architected solution approach," which means to build each data mart taking into account corporate information issues. As a result, we create the true customer list at the true atomic level, thus, as more marts come online, their dimensions are consistent and reusable, and the mart is more easily modified over time. The atomic level of data - the data staging area - makes it possible to join across marts, and your collection of marts becomes the enterprise data warehouse - ever evolving and no single mass project is ever embarked on.
Classic operational systems concentrate on high-level requirements that cater to the needs of all the users. When the global system is ready, the lower-level detailed needs of segments of the user community are met. This type of implementation is called top-down. A common example of top-down development is a financial management application. Specific modules are planned to deal with transactions as they move through the life cycle from budgets to expenditures. All the needs of all the users are attended to and the final product is delivered according to the big bang theory of system development. Data marts are usually developed in the opposite way, which is called bottom-up. The specific needs of small, focused locations within a business are addressed using the bottom-up approach.
Trade-offs is an interesting word when you think of designing data marts. Visions of a multiterabyte (a terabyte is 1,024 gigabytes) warehouse conjure up scenarios where end-user queries take forever to complete and cause endless frustration in the community. Because the data mart can check in at a fraction of the size of the enterprise data warehouse, analysts, development staff, and users may feel this is definitely the way to go based on size and size alone. This approach may be an easy sell to management in the short term, but, as all our experience has shown, it is not the way to proceed. Granted, current and future hardware acquisitions may be driven by the plans for data mart/warehouse rollout, but the business needs of your organization's decision makers outweigh all other factors. The data mart is developed using the dimensional modeling methodology, as discussed in Chapter 5, but focuses on only a single business process.
DATA MARTS
Data mart-these words evoke visions of a 24-hour convenience store with specific goods that are tailored to meet consumer needs. In the data mart, you could not be closer to the truth. The specialty item is data-data designed to solve the business requirements of a pocket of corporate data warehouse users. Data marts are subject-oriented dimensional databases with a normal life expectancy of three years. Unlike the enterprise data warehouse, they can check in with a figure under $250,000. Most data marts ring in at 25 gigabytes and support a user community of between 10 and 25 users.
Vendors of tools that build, and then manage, data marts must provide a cost-effective, rapid solution that can be used on any-sized project. The results of the build effort must be made available to the user community in a timely fashion. Timely fashion means a time to market of two to four months. Prospective users of corporate data marts can easily lose interest when turnaround times are longer. Data mart software must leverage existing operational and other DSS repositories during their build phases. Providers of technology must be able to read data directly from a suite of legacy systems implemented on IBM mainframe servers using products such as IMS and DB2, as well as newer systems such as SAP and PeopleSoft running on UNIX servers. Data can be extracted directly from these systems using COBOL modules, and the results moved into the data mart from flat files.
The model of the data mart is driven by the ways the user needs to view and use the information. Rather than paying so much attention to the physical layout of the data, the data mart model reflects what the users want to be able to do and how they wish to have it presented. Data mart implementers interview the users and, with their input and the knowledge the implementers have about the technology, design a model that is ideally suited to the users' requirements. This is an iterative process, which sometimes goes on for the life of the data mart. An iterative process is one that is repeated, but this does not mean the iteration is repeated because its previous execution was flawed. An iterative process is simply executed over and over again, usually in response to changing corporate uses of the data. Grocery shopping or paying the phone bill are parts of everyday life that are iterative processes.
Stand-Alone Data Marts
Pockets of some organizations have beaten a trail on their own to the data mart solution. Sometimes, in a largely decentralized company, segments of the business community have funded, developed, and deployed the data mart solution virtually without the involvement of the personnel tasked with the management of computer system solutions. These are known as stand-alone data marts. As the 1990s come to a close, and with the parallel- capable hardware and software products in the marketplace, these stand-alone data marts are difficult if not impossible to integrate into a larger corporate data mart or data warehouse initiative. Too many discrepancies exist between different stand-alone data marts in the way the data is structured and how the data is encoded. It is virtually impossible to merge these data marts' contents when looking for ways to share data between the corporate segments of an organization. As with data that comes from many data sources, the coalescing of the information can be difficult. The lessons that we learn from the building of data warehouses should be used when designing data marts - the data marts must form consistent views of our business. Data marts should not be built in isolation, and organizations must develop an overall data mart strategy that incorporates all lines with the business.
Many data marts are a subset of a large data warehouse's information, centrally designed, built, maintained, and distributed to groups of DSS users throughout a company. Today, the thrust is towards centralized management. This reduces the administrative redundancies inherent in a decentralized model. Companies are building data marts that feed on a combination of larger operational and decision support data sources. These data marts incorporate a company-wide approach to delivery of decision support systems; this is only possible in businesses that are committed to deploying systems from a central point under the auspices and supervision of a company-wide systems development and deployment group. The types of processes that lend themselves very well to data mart development include sales, purchasing, customer service processes, logistics, and manufacturing. The model shown in Figure 6-1 demonstrates how a process, such as Pay-TV Orders can form a data mart using a star schema. The data mart is a simple one, yet you can already realize the types of queries you could write, even with what at first glance appears to be a simple model.
Data marts often share information portions within the organization and an overall data mart strategy In the banking industry, many business processes share the account dimension. These processes will include bank balances, loans, credit cards, and marketing. The account is a common concept in almost all areas of the bank's business. By creating an account dimension and then sharing this dimension among the data marts, we allow the bank to form a common view of their account holders within many business focus areas. The idea of shared dimensions has been called "conformed dimensions" by Ralph Kimball. Dr. Kimball says that through the sharing of information contained in these conformed dimensions, we can join information together in what would have been independent data marts. This idea is critical to allow data marts to exceed the information limitations that a stand-alone data mart provides. With conformed data marts, you can now analyze your data across data marts. Remember that the granularity of your data marts must be the same for this analysis to be useful. The diagram in Figure 6-2 shows how our original data mart of Pay-TV Orders can be related to a customer product transaction data mart. Through the subscriber dimension we perform analysis, such as how many customers with a particular cable service have ordered Pay-TV family-rated movies....