Read an Excerpt
Chapter 2: Data Modeling
Models are supposed to represent reality. Because this book deals with an enterprise's data, the emphasis here will be on different methods for representing that data so as to simulate what really occurs.In this book, the database will be the desired model, which depicts how company information is organized. It's important to note, however, that the methods discussed in this book reflect Microsoft's constricted view of model development concerning databases design. Rather than considering all of what is commonly referred to as the system development life cycle (SDLC), in which systems are planned, analyzed, designed, implemented, and then operated and maintained, Microsoft simply focuses on the actual design portion (phase 3) of that cycle. Microsoft then breaks the system's development (or the company database, in this case) apart and forms two distinct modeling views-the logical and the physical models.
As a result, when we're discussing the logical model, it already includes all actions of a conceptual nature (i.e., planning and analysis) leading up to a system's ultimate creation-at least from Microsoft's perspective, that is. Similarly, when we're developing the physical model, all concrete activities such as implementation, operation, and maintenance are considered to be included there as well. Microsoft's viewpoint, therefore, is a simplistic form of the SDLC.
Now let's look at data modeling with an emphasis on the relational database system, Microsoft's basis for SQL Server 2000. Use of relational database systems such as SQL Server 2000 has become standard practice for enterprise data processing.
Using Data Models (Data Modeling)
Data modeling is the process of identifying, documenting, and implementing the data requirements for your application. It is an iterative process, repeatedly involving refinement of the following tasks:- Identifying each piece of data required to perform the business operations for your application. Some examples include Customer Name, ZIP Code, and Employee ID.
- Defining the type, size, and other traits for each piece of data. An example would be that the Customer Name field can contain 20 alphanumeric characters and must not be null (or empty).
- Defining how one data item relates to another. For example, Customers buy Products.
- Defining all constraints required by each piece of data. For example, the ZIP Code can be required to contain exactly five numeric characters, or each Employee ID can be required to contain a unique numeric identifier.
- Defining the operational processes required to effectively maintain the data.
Examples include activities such as security audits and scheduled backups.
- Choosing a data storage technology for implementing the model. Examples include relational, hierarchical, or indexed sequential access method (ISAM) databases.
- Organizing the data to avoid redundancy and inconsistency while maintain ing optimal overall performance-a process referred to as normalization.
During the logical model's development, the data requirements themselves are determined, and no concern is given to how data will be stored or what vendor's product will be used. With the completion of the logical model's development (which can actually be thought of as the beginning of the physical model's design), the process of finding a particular vendor's product should be started. That product should be the one found to optimally support the system's logical design.
During the physical phase, you map the logical design's elements into the database. Where the logical design decided "what" had to occur, the physical design specifies "how" it occurs. This is also when the database's supportive objects are actually developed. At this point, it becomes clear whether the selection process for choosing the best database product actually occurred or whether the constraints placed upon the design (cost, speed, ease of transition, and so on) had the process settling for less than the best.
Review of Database Types
Before we proceed, it might help to clarify some important terminology. It's important to understand the following database types: transaction processing, decision support, and relational:- Transaction processing database-Also known as an online transaction processing (OLTP) database, this is the most common type of database and will be the one primarily discussed in this book. This type of database contains the active set of data that a company uses. If the database will have a high access rate, it is normalized for optimal performance during the read and write functions.
- Decision support database-Also known as an online analytical processing (OLAP) database, this is used to help companies analyze all the data they have ac quired. This is usually historical data, whose purpose is to help companies predict future business scenarios. This type of database is often used in data warehousing and data mining. The data itself is usually not modified and is typically denormalized to gain optimum read performance.
- Relational database-Also known as a relational database management system (RDBMS), this is a set of data that is based on relationships. The relationships associate various information items to applicable database components. Thus, relationships are denoted by use of tables whose columns are logically interconnected, or linked.
Logical Design
With that slight review of database types behind us, it should be obvious that any questions about what is to be accomplished should come before you decide how it is undertaken. Determining a system's desired outcomes involves developing that system's logical design. It is during this development that requirements are gathered for the database's components and goals. To determine these requirements, the system users and the management team are interviewed. It is also important, at this point, to review any existing systems and see what data (if any) is already available from that system's use.When the logical design is being developed, no details should be decided about what database engine (or specific vendor product) will be used. Rather, the process should emphasize gathering the information that is needed to optimally support the application and meet the users' needs. In fact, at this point, databases, tables, constraints, or anything else "database-oriented" should not even be considered. Effort should be concentrated primarily on gathering the requirements for data and determining how all the pieces should ultimately work together.
The process of gathering and defining the data is iterative. You'll need to repeatedly gather such information as:
- Item name-How does one refer to the item?
- Description-What is the item?
- Responsible person-Who is responsible for the item?
- Data traits or characteristics-What does the item look like?
- Processing and relationships-How and when is the data concerning the item created, modified, or used? Does one item relate to or depend on another item...