MCSE: Microsoft SQL Server 2000 Database Design and Implementation Training Kit

Overview

This official "MCSE Training Kit" teaches how to design and implement database solutions using Microsoft SQL Server 2000 Enterprise Edition, as well as providing in-depth preparation for MCP Exam 70-229 — a core exam for MCDBA certification and an elective on the Microsoft Windows "RM" 2000 MCSE track. Topics map directly to the objectives measured by the MCP exam, including developing a logical data model, implementing the physical database, retrieving and modifying data, programming business logic, tuning and optimizing data access, and designing a database security plan. Students learn through an integrated system of skill-building tutorials, case study examples, and self-assessment exercises. An economical alternative to classroom instruction, this kit enables students to set their own pace and learn by doing!

Read More Show Less

Editorial Reviews

Booknews

Walks through the steps involved in designing and implementing an SQL server database, creating and maintaining database objects, and implementing data integrity. The study guide also introduces how to use transact-SQL to query an SQL server database and manipulate data stored in that database, and describes techniques for optimizing a database and managing server security. Exercises demonstrate a particular concept or skill. The CD-ROM contains practice files and an evaluation edition of the SQL server 2000. Annotation c. Book News, Inc., Portland, OR (booknews.com)

Read More Show Less

Product Details

ISBN-13: 9780735612488
Publisher: Microsoft Press
Publication date: 5/25/2001
Series: Microsoft Training Kits Series
Edition description: BK&CD-ROM
Pages: 650
Product dimensions: 7.72 (w) x 9.42 (h) x 2.11 (d)

Related Subjects

Meet the Author

Founded in 1975, Microsoft Corporation (Nasdaq 'MSFT') is the worldwide leader in software for personal and business computing. The company offers a wide range of products and services designed to empower people through great software—any time, any place, and on any device.

Read More Show Less

Read an Excerpt

Chapter 3: Designing a SQL Server Database

About This Chapter

A SQL Server database consists of a collection of tables that contain data and other objects—including views, indexes, stored procedures, and triggers—that are defined so as to support activities performed with the data. The data stored in a database is usually related to a particular subject or process, such as a retailer's customer information or a manufacturer's sales information. This chapter introduces you to the process of creating a SQL Server database and describes the basic concepts of database design. This chapter also provides information about planning a database, identifying system requirements, and developing a logical data model.

Before You Begin

To complete the lessons in this chapter, you must have

SQL Server 2000 Enterprise Edition installed on a Windows 2000 Server computer.
The ability to log on to the Windows 2000 Server computer and to SQL Server as the Windows 2000 administrator.
Paper and a pen or pencil to complete part of the exercises.

Lesson 1: Introduction to Database Design

Before you can develop a logical data model—and subsequently create a database and the objects it contains—you should understand the fundamental concepts of database design. In addition, you should be familiar with the basic components of a database and how those components work together to provide efficient data storage and to provide access to those who require specific types of data, in specific formats, from the database. This lesson introduces you to the basic components of a database and the terminology that describes those components. The lesson then discusses normalization and entity relationships—two concepts that are integral to understanding relational database design.

After this lesson, you will be able to:

Describe the main components of a relational database.
Describe the process of normalization and normalize tables in a database design.
Identify the relationships that exist between entities.

Estimated lesson time: 30 minutes

Components of a SQL Server Database

A SQL Server database consists of a collection of tables that store specific sets of structured data. A table (entity) contains a collection of rows (tuples) and columns (attributes). Each column in the table is designed to store a certain type of information (for example, dates, names, dollar amounts, or numbers). Tables have several types of controls (constraints, rules, triggers, defaults, and customized user data types) that ensure the validity of the data. Tables can have indexes (similar to those in books) that enable rows to be found quickly. You can add declarative referential integrity constraints to the tables to ensure consistency between interrelated data in different tables. A database can also store procedures that use Transact-SQL programming code to perform operations with the data in the database, such as storing views that provide customized access to table data.

For example, suppose that you create a database named MyCoDB to manage the data in your company. In the MyCoDB database, you create a table named Employees to store information about each employee, and the table contains columns named EmpID, LastName, FirstName, Dept, and Title. To ensure that no two employees share the same EmpID and that the Dept column contains only valid numbers for the departments in your company, you must add constraints to the table. Because you want to quickly find the data for an employee based on the employee's ID or last name, you define indexes. For each employee, you will have to add a row of data to the Employees table, so you create a stored procedure named AddEmployee that is customized to accept the data values for a new employee and that performs the operation of adding the row to the Employees table. You might need a departmental summary of employees, in which case you define a view named DeptEmps that combines data from the Departments and Employees tables and produces the output. Figure 3.1 shows the parts of the MyCoDB database.

Click to view graphic

Figure 3.1 The MyCoDB database, the Employees table, and the DeptEmps view.

Normalizing a Database Design

Optimizing a database design includes the process of normalization. Normalizing a logical database design involves using formal methods to separate the data into multiple, related tables. Having a greater number of narrow tables (with fewer columns) is characteristic of a normalized database; having a few wide tables (with more columns) is characteristic of a denormalized database.

Reasonable normalization often improves performance. When useful indexes are available, the SQL Server 2000 query optimizer is efficient at selecting rapid, efficient joins between tables.

As normalization increases, so do the number and complexity of joins required to retrieve data. Too many complex relational joins between too many tables can hinder performance. Reasonable normalization should include few regularly executed queries that use joins involving more than four tables.

A database that is used primarily for decision support (as opposed to update-intensive transaction processing) might not have redundant updates and might be more understandable and efficient for queries if the design is not fully normalized. Nevertheless, data that is not normalized is a more common design problem in database applications than over-normalized data. Starting with a normalized design and then selectively denormalizing tables for specific reasons is a good strategy.

Sometimes the logical database design is already fixed, however, and total redesign is not feasible. But even then, it might be possible to normalize a large table selectively into several smaller tables. If the database is accessed through stored procedures, this schema change could take place without affecting applications. If not, it might be possible to create a view that hides the schema change from the applications.

Achieving a Well-Designed Database

In relational database design theory, normalization rules identify certain attributes that must be present or absent in a well-designed database. These rules can become quite complicated and go well beyond the scope of this book. There are a few rules that can help you achieve a sound database design, however. A table should have an identifier, it should store data for only a single type of entity, it should avoid nullable columns, and it should not have repeating values or columns.

A Table Should Have an Identifier

The fundamental rule of database design theory is that each table should have a unique row identifier, which is a column or a set of columns used to distinguish any single record from every other record in the table. Each table should have an ID column, and no two records can share the same ID value. The column (or columns) that serves as the unique row identifier for a table is the primary key of the table.

In Figure 3.2, the Employees table does not include a column that uniquely identifies each row within the table. Notice that the name David Mendlen appears twice. Because there is no unique identifier in this table, there is no way to easily distinguish one row from another. This situation could be worse if both employees worked in the same department and had the same job title.

Click to view graphic

Figure 3.2 A table that has no unique identifier.

Click to view graphic

Figure 3.3 A normalized table that has a unique identifier.

You can normalize the table by adding a column that uniquely identifies each row, as shown in Figure 3.3. Notice that each instance of David Mendlen has a unique EmpID value.

A Table Should Store Data for Only a Single Type of Entity

Attempting to store too much information in a table can prevent the efficient and reliable management of the data in the table. For example, in Figure 3.4, the Books table includes information about each book's publisher.

Click to view graphic

Figure 3.4 A Table that includes title and publisher information.

Although it is possible to have columns that contain information for both the book and the publisher in the same table, this design leads to several problems. The publisher information must be added and stored redundantly for each book published by a given publisher. This information uses extra storage space in the database. If the address for the publisher changes, the change must be made for each book. Furthermore, if the last book for a publisher is removed from the Books table, the information for that publisher is lost.

In a normalized database, the information about books and publishers would be stored in at least two tables: one for book titles and one for publishers (as shown in Figure 3.5).

Click to view graphic

Figure 3.5 A normalized database design that includes a table for book titles and a table for publisher information.

The information about the publisher now has to be entered only once and then linked to each book. If the publisher information changes, it must be changed in only one place, and the publisher information will be there even if the publisher has no books in the database.

A Table Should Avoid Nullable Columns

Tables can have columns defined to allow null values. A null value indicates that the record has no value for that attribute. Although it can be useful to allow null values in isolated cases, it is best to use them sparingly because they require special handling that increases the complexity of data operations. If you have a table that has several nullable columns and several of the rows have null values in the columns, you should consider placing these columns in another table linked to the primary table. Storing the data in two separate tables enables the primary table to be simple in design but capable of accommodating the occasional need for storing this information.

A Table Should Not Have Repeating Values or Columns

A table should not contain a list of values for a specific piece of information. For example, suppose that you want to track book titles and their authors. Although most books might have only one author, many of them might have two or more. If there is only one column in the Books table for the name of the author, this situation presents a problem. One solution is to store the name of both authors in the column, but showing a list of individual authors would then be difficult. Another solution is to change the structure of the table to add another column for the name of the second author, but this solution accommodates only two authors. Yet another column must be added if a book has three authors.

Figure 3.6 shows two methods of handling multiple authors per title.

Click to view graphic

Figure 3.6 Two methods for structuring the Books table.

If you find that you need to store a list of values in a single column or if you have multiple columns for a single piece of data (Author1, Author2, and so on), you should consider placing the duplicated data in another table with a link to the primary table. In the case of the Books table, you could create an additional primary table for authors and then create a third table that matches books to authors and accommodates repeating values, as shown in Figure 3.7. This design enables any number of authors for a book without modifying the definition of the table and allocates no unused storage space for books that have a single author.

Click to view graphic

Figure 3.7 Three tables that store information about books and their authors. (Image unavailable)

Entity Relationships

In a relational database, relationships help to prevent redundant data. A relationship works by matching data in key columns—usually columns that have the same name in both tables. In most cases, the relationship matches the primary key from one table, which provides a unique identifier for each row with an entry in the foreign key in the other table. Primary keys and foreign keys are discussed in more detail in Chapter 5, "Implementing Data Integrity."

There are three types of relationships between tables: one-to-one, one-to-many, and many-to-many. The type of relationship depends on how the related columns are defined.

One-to-One Relationships

In a one-to-one relationship, a row in table A can have no more than one matching row in table B (and vice versa). A one-to-one relationship is created if both of the related columns are primary keys or have unique constraints. This type of relationship is not common, however, because information related in this way would usually be in one table.

One-to-Many Relationships

A one-to-many relationship is the most common type of relationship. In this type of relationship, a row in table A can have many matching rows in table B, but a row in table B can have only one matching row in table A. For example, the Publishers and Titles tables mentioned previously have a one-to-many relationship. Each publisher produces many titles, but each title comes from only one publisher. A one-to-many relationship is created if only one of the related columns is a primary key or has a unique constraint.

Many-to-Many Relationships

In a many-to-many relationship, a row in table A can have many matching rows in table B (and vice versa). You create such a relationship by defining a third table, called a junction table, whose primary key consists of the foreign keys from both table A and table B. In Figures 3-6 and 3-7, you saw how the author information could be separated into another table. The Books table and the Authors table have a many-to-many relationship. Each of these tables has a one-to-many relationship with the BookAuthor table, which serves as the junction table between the two primary tables.

Exercise 1: Exploring the Basic Concepts of Database Design

In this exercise, you will view the primary objects that are contained in a SQL Server database. You will then apply the principles of normalization to a database design and identify the relationships that exist between entities within a database. To perform this exercise, you should be logged into your Windows 2000 Server computer as Administrator. You will use SQL Query Analyzer and SQL Server Enterprise Manager for part of the exercise, and you will need paper and a pencil to complete the rest of the exercise.

To identify the main components of a SQL Server database

Open SQL Query Analyzer and use Windows authentication to log on to SQL Server.
Open the Object Browser window if it is not already displayed.
The Object Browser window displays a hierarchical tree of database objects contained in the instance of SQL Server that you are logged on to.
Review the list of database objects that appear in the tree. Notice that the Northwind and Pubs databases appear as objects in the tree. The tree also includes a list of common objects, such as aggregate functions and system data types.
Expand the Northwind node.
A list of object types appears. The list includes users tables, system tables, stored procedures, functions, and user-defined data types. Each category contains objects specific to the Northwind database.
Expand the Users Tables node.
A list of users tables in the Northwind database appears. Notice that each table object is preceded by the object owner (which, in this case, is dbo).

To view the contents of a table

Right-click dbo.Categories, then click Open.
The Open Table window appears and displays the contents of the Categories table.
Review the columns and rows within the table.
What are the column names (attributes) in the Categories table, and how many rows of data are displayed?
Close the Open Table window.
Review each users table object to determine the columns in each one. Be sure to close the Open Table window after you review that table. You can also review the system tables to view their attributes.

To use the sp_help system stored procedure to view table information

In the Query window, execute the following Transact-SQL statement:
```
Use Northwind
```
```
GO
```
```
sp_help
```
A list of all objects in the Northwind database appears on the Grids tab of the Results pane.
NOTE:
After the result appears on the Grids tab, click within the Results pane. A second scroll bar will appear, and you can scroll through all of the objects in the database.
Close SQL Query Analyzer.

To normalize a database design

Review the following table:
FirstName LastName City

Elizabeth Boyle Cleveland

Rob Caron Chicago

FirstName LastName City

Neil Smith Denver

Denise Smith Boston
Keeping in mind the table's design, apply the four basic rules that you should follow when designing a database. The rules are listed here for your convenience:
- A table should have an identifier.
- A table should store data for only a single type of entity.
- A table should avoid nullable columns.
- A table should not have repeating values or columns.
Which rule is being violated in the Customers table?
Modify the table's design so that it adheres to the basic rules of normalization. Use your paper and pencil to draw the table and its data.
How should you modify the data?
Review the following table:

CustID	FirstName	LastName	City	PurchaseType
101	Elizabeth	Boyle	Cleveland	Books, CDs
102	Rob	Caron	Chicago	Books, videos
103	Neil	Smith	Denver	CDs, videos, DVDs
104	Denise	Smith	Boston	Books

The PurchaseType column contains a list of the types of products that the customer has purchased.

Determine which rule(s) of normalization is being violated in the Customers table.
Modify the database design so that it adheres to the rules of normalization.
How should you modify the current design?
Review the following table:

CustID	FirstName	LastName	City	Purchase	Manufacturer	ManContact
101	Elizabeth	Boyle	Cleveland	Spring candles	Pavlova, Ltd.	Ian Devling
102	Rob	Caron	Chicago	Sandalwood incense	Mayumi's	Mayumi Ohno
103	Neil	Smith	Denver	Sage	Pavlova, Ltd	Ian Devling
104	Denise	Smith	Boston	Hanging crystal	Leka Trading	Chandra Leka

Determine which rule(s) of normalization is being violated.
Modify the database design so that it adheres to the rules of normalization.
How should you modify the current design?

To generate a database diagram in SQL Server

On the Start menu, point to Programs and then point to Microsoft SQL Server. Then, click Enterprise Manager.
SQL Server Enterprise Manager appears.
In the Tree tab, expand the Microsoft SQL Servers node, expand SQL Server Group, expand the node for your local computer, expand Databases, and then expand the Pubs database.
A list of object types in the Pubs database should now be displayed.
Right-click Diagrams, then click New Database Diagram.
The Create Database Diagram wizard appears.
Click Next.
The Select Tables To Be Added screen appears. The list of tables in the Pubs database appears in the left window.
Select the Add Selected Tables Automatically checkbox.
The How Many Levels Of Related Tables option becomes active.
Change the level in the How Many Levels Of Related Tables list box to 2.
Select the Authors table from the Available Tables list, then click Add.
The Authors table, TitleAuthor table, and Titles table are added to the Tables To Add To Diagram list.
Click Next.
The Completing The Create Database Diagram Wizard screen appears.
Click Finish.
The diagram is generated and appears in a new window. The diagram includes the Authors table, the TitleAuthor table, and the Titles table.

To view entity relationships in a database diagram

If the tables overlap, click the Arrange Tables button on the toolbar.
The tables should now be arranged so that you can see the relationships between them.
If necessary, click the Zoom button on the toolbar and select a size that would make the relationships easier to identify. Maximize the window if necessary.
View the connector that links the Authors table to the TitleAuthor table and the connector that links that Titles table to the TitleAuthor table.
The connector indicates that a relationship exists between the tables. Notice that there is no connector between the Author table and the Titles table because no direct relationship exists between the tables.

At one end of the connecter is a key, which indicates one. The other side of the connector is an infinity sign, which indicates many. These symbols tell you that a one-to-many relationship exists between the Authors table and the TitleAuthor table and that a one-to-many relationship exists between the Titles table and the TitleAuthor table.

What is the implied relationship between the Titles table and the Authors table?

What type of table is the TitleAuthor table (in terms of the implied relationship between Titles and Authors)?
Close the diagram window without saving the changes that you made, then close SQL Server Enterprise Manager.

NOTE:
As an extra exercise, try creating diagrams for other users tables within the Pubs database and for tables in the Northwind database. You can also try selecting levels greater than two in the How Many Levels Of Related Tables list box. Experiment with the different levels and tables.

Lesson Summary

A SQL Server database consists of a collection of tables that store a specific set of structured data. A table contains a collection of rows and columns. Each column in the table is designed to store a certain type of information (for example, dates, names, dollar amounts, or numbers). The logical design of the database, including the tables and the relationships between them, is the core of an optimized, relational database. Optimizing a database design includes the process of normalization. Normalizing a logical database design involves using formal methods to separate the data into multiple, related tables. As normalization increases, so do the number and complexity of joins that are required to retrieve data. Normalization rules identify certain attributes that must be present or absent in a well-designed database. Tables in a normalized database should have an identifier, should only store data for a single type of entity, should avoid nullable columns, and should not have repeating values or columns. You can create relationships between your tables in a database diagram to show how the columns in one table are linked to columns in another table. In a relational database, relationships help to prevent redundant data. A relationship works by matching data in key columns—usually columns that have the same name in both tables. There are three types of relationships between tables: one-to-one, one-to-many, and many-to-many. The type of relationship depends on how you define the related columns....

Read More Show Less

About This Book ..... xiii: Intended Audience ..... xiii; Prerequisites ..... xiv; Reference Materials ..... xiv; About the CD-ROM ..... xiv; Features of this Book ..... xv; The Microsoft Certified Professional Program ..... xxii; Technical Support ..... xxvii
Chapter 1: Introduction to Microsoft SQL Server 2000 ..... 1: Lesson 1: Overview of SQL Server 2000 ..... 2; Lesson 2: Components of SQL Server 2000 ..... 9; Lesson 3: Overview of SQL Server 2000 Architecture ..... 20; Review ..... 42
Chapter 2: Using Transact-SQL on a SQL Server Database ..... 45: Lesson 1: SQL Server Programming Tools ..... 46; Lesson 2: Introduction to Transact-SQL ..... 66; Lesson 3: Transact-SQL Syntax Elements ..... 77; Lesson 4: Executing Transact-SQL Statements ..... 89; Review ..... 96
Chapter 3: Designing a SQL Server Database ..... 99: Lesson 1: Introduction to Database Design ..... 100; Lesson 2: Planning a SQL Server Database ..... 112; Lesson 3: Identifying System Requirements ..... 119; Lesson 4: Developing a Logical Data Model ..... 129; Review ..... 141
Chapter 4: Implementing SQL Server Databases and Tables ..... 143: Lesson 1: Creating and Managing a SQL Server Database ..... 144; Lesson 2: Identifying Data Types ..... 155; Lesson 3: Creating and Managing Tables ..... 163; Review ..... 176
Chapter 5: Implementing Data Integrity ..... 179: Lesson 1: Introduction to Data Integrity ..... 180; Lesson 2: Implementing Integrity Constraints ..... 187; Review ..... 202
Chapter 6: Accessing and Modifying Data ..... 205: Lesson 1: Accessing Data in a SQL Server Database ..... 206; Lesson 2: Using Advanced Query Techniques to Access Data ..... 215; Lesson 3: Modifying Data in a SQL Server Database ..... 229; Review ..... 240
Chapter 7: Managing and Manipulating Data ..... 243: Lesson 1: Importing and Exporting Data ..... 244; Lesson 2: Using Distributed Queries to Access External Data ..... 256; Lesson 3: Using Cursors to Retrieve Data ..... 263; Lesson 4: Retrieving XML Data ..... 272; Review ..... 281
Chapter 8: Implementing Stored Procedures ..... 283: Lesson 1: Introduction to Stored Procedures ..... 284; Lesson 2: Creating, Executing, Modifying, and Deleting Stored Procedures ..... 292; Lesson 3: Programming Stored Procedures ..... 310; Review ..... 327
Chapter 9: Implementing Triggers ..... 329: Lesson 1: Introduction to Triggers ..... 331; Lesson 2: Creating and Managing Triggers ..... 339; Lesson 3: Programming Triggers ..... 350; Review ..... 361
Chapter 10: Implementing Views ..... 363: Lesson 1: Introduction to Views ..... 364; Lesson 2: Creating, Modifying, and Deleting Views ..... 368; Lesson 3: Accessing Data through Views ..... 379; Review ..... 388
Chapter 11: Implementing Indexes ..... 391: Lesson 1: Index Architecture ..... 392; Lesson 2: Index Creation and Administration ..... 403; Review ..... 418
Chapter 12: Managing SQL Server Transactions and Locks ..... 421: Lesson 1: Transaction and Locking Architecture ..... 422; Lesson 2: Managing SQL Server Transactions ..... 432; Lesson 3: Managing SQL Server Locking ..... 446; Review ..... 459
Chapter 13: Designing and Administering SQL Server 2000 Security ..... 461: Lesson 1: Overview of SQL Server 2000 Security ..... 462; Lesson 2: Designing a Database Security Plan ..... 473; Lesson 3: Database Security Implementation and Administration ..... 479; Review ..... 496
Chapter 14: SQL Server Monitoring and Tuning ..... 499: Lesson 1: Monitoring Databases with SQL Profiler ..... 500; Lesson 2: Index Tuning and Database Partitioning ..... 511; Review ..... 519
Appendix A: Questions and Answers ..... 521
Glossary ..... 575
Index ..... 623

Read More Show Less

First Chapter

Chapter 3.

Designing a SQL Server Database

Lesson 1: Introduction to Database Design
- Components of a SQL Server Database
- Normalizing a Database Design
- Entity Relationships
- Lesson Summary
Lesson 2: Planning a SQL Server Database
- Files and Filegroups
- Transaction Logs
- Environment
- SQL Server Installation
- Security
- Lesson Summary
Lesson 3: Identifying System Requirements
- The Process of Identifying System Requirements
- Identifying System Goals
- Identifying the Amount and Types of Data
- Identifying How the Data Will Be Used
- Identifying Business Rules of the System
- Lesson Summary
Lesson 4: Developing a Logical Data Model
- Identifying Entities and Their Attributes
- Identifying Relationships Between Entities
- Identifying Constraints on Data
- Lesson Summary
Review

Chapter 3 Designing a SQL Server Database

About This Chapter

A SQL Server database consists of a collection of tables that contain data and other objects—including views, indexes, stored procedures, and triggers—that are defined so as to support activities performed with the data. The data stored in a database is usually related to a particular subject or process, such as a retailer’s customer information or a manufacturer’s sales information. This chapter introduces you to the process of creating a SQL Server database and describes the basic concepts of database design. This chapter also provides information about planning a database, identifying system requirements, and developing a logical data model.

Before You Begin

To complete the lessons in this chapter, you must have

SQL Server 2000 Enterprise Edition installed on a Windows 2000 Server computer.
The ability to log on to the Windows 2000 Server computer and to SQL Server as the Windows 2000 administrator.
Paper and a pen or pencil to complete part of the exercises.

Lesson 1: Introduction to Database Design

Before you can develop a logical data model—and subsequently create a database and the objects it contains—you should understand the fundamental concepts of database design. In addition, you should be familiar with the basic components of a database and how those components work together to provide efficient data storage and to provide access to those who require specific types of data, in specific formats, from the database. This lesson introduces you to the basic components of a database and the terminology that describes those components. The lesson then discusses normalization and entity relationships—two concepts that are integral to understanding relational database design.

After this lesson, you will be able to:

Describe the main components of a relational database.
Describe the process of normalization and normalize tables in a database design.
Identify the relationships that exist between entities.

Estimated lesson time: 30 minutes

Components of a SQL Server Database

A SQL Server database consists of a collection of tables that store specific sets of structured data. A table (entity) contains a collection of rows (tuples) and columns (attributes). Each column in the table is designed to store a certain type of information (for example, dates, names, dollar amounts, or numbers). Tables have several types of controls (constraints, rules, triggers, defaults, and customized user data types) that ensure the validity of the data. Tables can have indexes (similar to those in books) that enable rows to be found quickly. You can add declarative referential integrity constraints to the tables to ensure consistency between interrelated data in different tables. A database can also store procedures that use Transact-SQL programming code to perform operations with the data in the database, such as storing views that provide customized access to table data.

For example, suppose that you create a database named MyCoDB to manage the data in your company. In the MyCoDB database, you create a table named Employees to store information about each employee, and the table contains columns named EmpID, LastName, FirstName, Dept, and Title. To ensure that no two employees share the same EmpID and that the Dept column contains only valid numbers for the departments in your company, you must add constraints to the table. Because you want to quickly find the data for an employee based on the employee’s ID or last name, you define indexes. For each employee, you will have to add a row of data to the Employees table, so you create a stored procedure named AddEmployee that is customized to accept the data values for a new employee and that performs the operation of adding the row to the Employees table. You might need a departmental summary of employees, in which case you define a view named DeptEmps that combines data from the Departments and Employees tables and produces the output. Figure 3.1 shows the parts of the MyCoDB database.

Figure 3.1 The MyCoDB database, the Employees table, and the DeptEmps view. (Image unavailable)

Normalizing a Database Design

Optimizing a database design includes the process of normalization. Normalizing a logical database design involves using formal methods to separate the data into multiple, related tables. Having a greater number of narrow tables (with fewer columns) is characteristic of a normalized database; having a few wide tables (with more columns) is characteristic of a denormalized database.

Reasonable normalization often improves performance. When useful indexes are available, the SQL Server 2000 query optimizer is efficient at selecting rapid, efficient joins between tables.

As normalization increases, so do the number and complexity of joins required to retrieve data. Too many complex relational joins between too many tables can hinder performance. Reasonable normalization should include few regularly executed queries that use joins involving more than four tables.

A database that is used primarily for decision support (as opposed to update-intensive transaction processing) might not have redundant updates and might be more understandable and efficient for queries if the design is not fully normalized. Nevertheless, data that is not normalized is a more common design problem in database applications than over-normalized data. Starting with a normalized design and then selectively denormalizing tables for specific reasons is a good strategy.

Sometimes the logical database design is already fixed, however, and total redesign is not feasible. But even then, it might be possible to normalize a large table selectively into several smaller tables. If the database is accessed through stored procedures, this schema change could take place without affecting applications. If not, it might be possible to create a view that hides the schema change from the applications.

Achieving a Well-Designed Database

In relational database design theory, normalization rules identify certain attributes that must be present or absent in a well-designed database. These rules can become quite complicated and go well beyond the scope of this book. There are a few rules that can help you achieve a sound database design, however. A table should have an identifier, it should store data for only a single type of entity, it should avoid nullable columns, and it should not have repeating values or columns.

A Table Should Have an Identifier

The fundamental rule of database design theory is that each table should have a unique row identifier, which is a column or a set of columns used to distinguish any single record from every other record in the table. Each table should have an ID column, and no two records can share the same ID value. The column (or columns) that serves as the unique row identifier for a table is the primary key of the table.

In Figure 3.2, the Employees table does not include a column that uniquely identifies each row within the table. Notice that the name David Mendlen appears twice. Because there is no unique identifier in this table, there is no way to easily distinguish one row from another. This situation could be worse if both employees worked in the same department and had the same job title.

Figure 3.2 A table that has no unique identifier. (Image unavailable)

Figure 3.3 A normalized table that has a unique identifier. (Image unavailable)

You can normalize the table by adding a column that uniquely identifies each row, as shown in Figure 3.3. Notice that each instance of David Mendlen has a unique EmpID value.

A Table Should Store Data for Only a Single Type of Entity

Attempting to store too much information in a table can prevent the efficient and reliable management of the data in the table. For example, in Figure 3.4, the Books table includes information about each book’s publisher.

Figure 3.4 A Table that includes title and publisher information. (Image unavailable)

Although it is possible to have columns that contain information for both the book and the publisher in the same table, this design leads to several problems. The publisher information must be added and stored redundantly for each book published by a given publisher. This information uses extra storage space in the database. If the address for the publisher changes, the change must be made for each book. Furthermore, if the last book for a publisher is removed from the Books table, the information for that publisher is lost.

In a normalized database, the information about books and publishers would be stored in at least two tables: one for book titles and one for publishers (as shown in Figure 3.5).

Figure 3.5 A normalized database design that includes a table for book titles and a table for publisher information. (Image unavailable)

The information about the publisher now has to be entered only once and then linked to each book. If the publisher information changes, it must be changed in only one place, and the publisher information will be there even if the publisher has no books in the database.

A Table Should Avoid Nullable Columns

Tables can have columns defined to allow null values. A null value indicates that the record has no value for that attribute. Although it can be useful to allow null values in isolated cases, it is best to use them sparingly because they require special handling that increases the complexity of data operations. If you have a table that has several nullable columns and several of the rows have null values in the columns, you should consider placing these columns in another table linked to the primary table. Storing the data in two separate tables enables the primary table to be simple in design but capable of accommodating the occasional need for storing this information.

A Table Should Not Have Repeating Values or Columns

A table should not contain a list of values for a specific piece of information. For example, suppose that you want to track book titles and their authors. Although most books might have only one author, many of them might have two or more. If there is only one column in the Books table for the name of the author, this situation presents a problem. One solution is to store the name of both authors in the column, but showing a list of individual authors would then be difficult. Another solution is to change the structure of the table to add another column for the name of the second author, but this solution accommodates only two authors. Yet another column must be added if a book has three authors.

Figure 3.6 shows two methods of handling multiple authors per title.

Figure 3.6 Two methods for structuring the Books table. (Image unavailable)

If you find that you need to store a list of values in a single column or if you have multiple columns for a single piece of data (Author1, Author2, and so on), you should consider placing the duplicated data in another table with a link to the primary table. In the case of the Books table, you could create an additional primary table for authors and then create a third table that matches books to authors and accommodates repeating values, as shown in Figure 3.7. This design enables any number of authors for a book without modifying the definition of the table and allocates no unused storage space for books that have a single author.

Figure 3.7 Three tables that store information about books and their authors. (Image unavailable)

Entity Relationships

In a relational database, relationships help to prevent redundant data. A relationship works by matching data in key columns—usually columns that have the same name in both tables. In most cases, the relationship matches the primary key from one table, which provides a unique identifier for each row with an entry in the foreign key in the other table. Primary keys and foreign keys are discussed in more detail in Chapter 5, "Implementing Data Integrity."

There are three types of relationships between tables: one-to-one, one-to-many, and many-to-many. The type of relationship depends on how the related columns are defined.

One-to-One Relationships

In a one-to-one relationship, a row in table A can have no more than one matching row in table B (and vice versa). A one-to-one relationship is created if both of the related columns are primary keys or have unique constraints. This type of relationship is not common, however, because information related in this way would usually be in one table.

One-to-Many Relationships

A one-to-many relationship is the most common type of relationship. In this type of relationship, a row in table A can have many matching rows in table B, but a row in table B can have only one matching row in table A. For example, the Publishers and Titles tables mentioned previously have a one-to-many relationship. Each publisher produces many titles, but each title comes from only one publisher. A one-to-many relationship is created if only one of the related columns is a primary key or has a unique constraint.

Many-to-Many Relationships

In a many-to-many relationship, a row in table A can have many matching rows in table B (and vice versa). You create such a relationship by defining a third table, called a junction table, whose primary key consists of the foreign keys from both table A and table B. In Figures 3-6 and 3-7, you saw how the author information could be separated into another table. The Books table and the Authors table have a many-to-many relationship. Each of these tables has a one-to-many relationship with the BookAuthor table, which serves as the junction table between the two primary tables.

Exercise 1: Exploring the Basic Concepts of Database Design

In this exercise, you will view the primary objects that are contained in a SQL Server database. You will then apply the principles of normalization to a database design and identify the relationships that exist between entities within a database. To perform this exercise, you should be logged into your Windows 2000 Server computer as Administrator. You will use SQL Query Analyzer and SQL Server Enterprise Manager for part of the exercise, and you will need paper and a pencil to complete the rest of the exercise.

To identify the main components of a SQL Server database

Open SQL Query Analyzer and use Windows authentication to log on to SQL Server.
Open the Object Browser window if it is not already displayed.
The Object Browser window displays a hierarchical tree of database objects contained in the instance of SQL Server that you are logged on to.
Review the list of database objects that appear in the tree. Notice that the Northwind and Pubs databases appear as objects in the tree. The tree also includes a list of common objects, such as aggregate functions and system data types.
Expand the Northwind node.
A list of object types appears. The list includes users tables, system tables, stored procedures, functions, and user-defined data types. Each category contains objects specific to the Northwind database.
Expand the Users Tables node.
A list of users tables in the Northwind database appears. Notice that each table object is preceded by the object owner (which, in this case, is dbo).

To view the contents of a table

Right-click dbo.Categories, then click Open.
The Open Table window appears and displays the contents of the Categories table.
Review the columns and rows within the table.
What are the column names (attributes) in the Categories table, and how many rows of data are displayed?
Close the Open Table window.
Review each users table object to determine the columns in each one. Be sure to close the Open Table window after you review that table. You can also review the system tables to view their attributes.

To use the sp_help system stored procedure to view table information

In the Query window, execute the following Transact-SQL statement:
```
Use Northwind
```
```
GO
```
```
sp_help
```
A list of all objects in the Northwind database appears on the Grids tab of the Results pane.
NOTE:
After the result appears on the Grids tab, click within the Results pane. A second scroll bar will appear, and you can scroll through all of the objects in the database.
Close SQL Query Analyzer.

To normalize a database design

Review the following table:
FirstName LastName City

Elizabeth Boyle Cleveland

Rob Caron Chicago

FirstName LastName City

Neil Smith Denver

Denise Smith Boston
Keeping in mind the table’s design, apply the four basic rules that you should follow when designing a database. The rules are listed here for your convenience:
- A table should have an identifier.
- A table should store data for only a single type of entity.
- A table should avoid nullable columns.
- A table should not have repeating values or columns.
Which rule is being violated in the Customers table?
Modify the table’s design so that it adheres to the basic rules of normalization. Use your paper and pencil to draw the table and its data.
How should you modify the data?
Review the following table:

CustID	FirstName	LastName	City	PurchaseType
101	Elizabeth	Boyle	Cleveland	Books, CDs
102	Rob	Caron	Chicago	Books, videos
103	Neil	Smith	Denver	CDs, videos, DVDs
104	Denise	Smith	Boston	Books

The PurchaseType column contains a list of the types of products that the customer has purchased.

Determine which rule(s) of normalization is being violated in the Customers table.
Modify the database design so that it adheres to the rules of normalization.
How should you modify the current design?
Review the following table:

CustID	FirstName	LastName	City	Purchase	Manufacturer	ManContact
101	Elizabeth	Boyle	Cleveland	Spring candles	Pavlova, Ltd.	Ian Devling
102	Rob	Caron	Chicago	Sandalwood incense	Mayumi’s	Mayumi Ohno
103	Neil	Smith	Denver	Sage	Pavlova, Ltd	Ian Devling
104	Denise	Smith	Boston	Hanging crystal	Leka Trading	Chandra Leka

Determine which rule(s) of normalization is being violated.
Modify the database design so that it adheres to the rules of normalization.
How should you modify the current design?

To generate a database diagram in SQL Server

On the Start menu, point to Programs and then point to Microsoft SQL Server. Then, click Enterprise Manager.
SQL Server Enterprise Manager appears.
In the Tree tab, expand the Microsoft SQL Servers node, expand SQL Server Group, expand the node for your local computer, expand Databases, and then expand the Pubs database.
A list of object types in the Pubs database should now be displayed.
Right-click Diagrams, then click New Database Diagram.
The Create Database Diagram wizard appears.
Click Next.
The Select Tables To Be Added screen appears. The list of tables in the Pubs database appears in the left window.
Select the Add Selected Tables Automatically checkbox.
The How Many Levels Of Related Tables option becomes active.
Change the level in the How Many Levels Of Related Tables list box to 2.
Select the Authors table from the Available Tables list, then click Add.
The Authors table, TitleAuthor table, and Titles table are added to the Tables To Add To Diagram list.
Click Next.
The Completing The Create Database Diagram Wizard screen appears.
Click Finish.
The diagram is generated and appears in a new window. The diagram includes the Authors table, the TitleAuthor table, and the Titles table.

To view entity relationships in a database diagram

If the tables overlap, click the Arrange Tables button on the toolbar.
The tables should now be arranged so that you can see the relationships between them.
If necessary, click the Zoom button on the toolbar and select a size that would make the relationships easier to identify. Maximize the window if necessary.
View the connector that links the Authors table to the TitleAuthor table and the connector that links that Titles table to the TitleAuthor table.
The connector indicates that a relationship exists between the tables. Notice that there is no connector between the Author table and the Titles table because no direct relationship exists between the tables.

At one end of the connecter is a key, which indicates one. The other side of the connector is an infinity sign, which indicates many. These symbols tell you that a one-to-many relationship exists between the Authors table and the TitleAuthor table and that a one-to-many relationship exists between the Titles table and the TitleAuthor table.

What is the implied relationship between the Titles table and the Authors table?

What type of table is the TitleAuthor table (in terms of the implied relationship between Titles and Authors)?
Close the diagram window without saving the changes that you made, then close SQL Server Enterprise Manager.

NOTE:
As an extra exercise, try creating diagrams for other users tables within the Pubs database and for tables in the Northwind database. You can also try selecting levels greater than two in the How Many Levels Of Related Tables list box. Experiment with the different levels and tables.

Lesson Summary

A SQL Server database consists of a collection of tables that store a specific set of structured data. A table contains a collection of rows and columns. Each column in the table is designed to store a certain type of information (for example, dates, names, dollar amounts, or numbers). The logical design of the database, including the tables and the relationships between them, is the core of an optimized, relational database. Optimizing a database design includes the process of normalization. Normalizing a logical database design involves using formal methods to separate the data into multiple, related tables. As normalization increases, so do the number and complexity of joins that are required to retrieve data. Normalization rules identify certain attributes that must be present or absent in a well-designed database. Tables in a normalized database should have an identifier, should only store data for a single type of entity, should avoid nullable columns, and should not have repeating values or columns. You can create relationships between your tables in a database diagram to show how the columns in one table are linked to columns in another table. In a relational database, relationships help to prevent redundant data. A relationship works by matching data in key columns—usually columns that have the same name in both tables. There are three types of relationships between tables: one-to-one, one-to-many, and many-to-many. The type of relationship depends on how you define the related columns.

Lesson 2: Planning a SQL Server Database

When planning a SQL Server database, the design must take into consideration a number of factors, including database files and filegroups, transaction logs, the SQL Server installation and its operating environment, and security. This lesson discusses each of these considerations.

After this lesson, you will be able to:

Describe many of the factors that you should take into consideration when planning a SQL Server database.

Estimated lesson time: 25 minutes

Files and Filegroups

To map a database, SQL Server 2000 uses a set of operating system files. All data and objects in the database, such as tables, stored procedures, triggers, and views, are stored within the following types of operating system files:

Primary. This file contains the startup information for the database and is used to store data. Every database has one primary data file.
Secondary. These files hold all of the data that does not fit into the primary data file. If the primary file can hold all of the data in the database, databases do not need to have secondary data files. Some databases might be large enough to need multiple secondary data files or to use secondary files on separate disk drives to spread data across multiple disks or to improve database performance.
Transaction Log. These files hold the log information used to recover the database. There must be at least one log file for each database.

A simple database can be created with one primary file that contains all data and objects and a log file that contains the transaction log information. Alternatively, a more complex database can be created with one primary file and five secondary files. The data and objects within the database spread across all six files, and four additional log files contain the transaction log information.

Filegroups group files together for administrative and data allocation/placement purposes. For example, three files (Data1.ndf, Data2.ndf, and Data3.ndf) can be created on three disk drives and assigned to the filegroup fgroup1. A table can then be created specifically on the filegroup fgroup1. Queries for data from the table will be spread across the three disks, thereby improving performance. The same performance improvement can be accomplished with a single file created on a redundant array of independent disks (RAID) stripe set. Files and filegroups, however, help to easily add new files to new disks. Additionally, if your database exceeds the maximum size for a single Windows NT file, you can use secondary data files to grow your database further.

Rules for Designing Files and Filegroups

When designing files and filegroups, you should adhere to the following rules:

A file or filegroup cannot be used by more than one database. For example, the files sales.mdf and sales.ndf, which contain data and objects from the sales database, cannot be used by any other database.
A file can be a member of only one filegroup.
Data and transaction log information cannot be part of the same file or filegroup.
Transaction log files are never part of a filegroup.

Default Filegroups

A database comprises a primary filegroup and any user-defined filegroups. The filegroup that contains the primary file is the primary filegroup. When a database is created, the primary filegroup contains the primary data file and any other files that are not put into another filegroup. All system tables are allocated in the primary filegroup. If the primary filegroup runs out of space, no new catalog information can be added to the system tables. The primary filegroup is filled only if autogrow is turned off or if all of the disks that are holding the files in the primary filegroup run out of space. If this situation happens, either turn autogrow back on or move other files off the disks to free more space.

User-defined filegroups are any filegroups that are specifically created by the user when he or she is first creating or later altering the database. If a user-defined filegroup fills up, only the users tables specifically allocated to that filegroup would be affected.

At any time, exactly one filegroup is designated as the default filegroup. When objects are created in the database without specifying to which filegroup they belong, they are assigned to the default filegroup. The default filegroup must be large enough to hold any objects not allocated to a user-defined filegroup. Initially, the primary filegroup is the default filegroup.

The default filegroup can be changed by using the ALTER DATABASE statement. When you change the default filegroup, any objects that do not have a filegroup specified when they are created are allocated to the data files in the new default filegroup. Allocation for the system objects and tables, however, remains within the primary filegroup, not in the new default filegroup.

Changing the default filegroup prevents user objects that are not specifically created on a user-defined filegroup from competing with the system objects and tables for data space.

Recommendations

When implementing a database, you should try to adhere to the following guidelines for using files and filegroups:

Most databases will work well with a single data file and a single transaction log file.
If you use multiple files, create a second filegroup for the additional files and make that filegroup the default filegroup. This way, the primary file will contain only system tables and objects.
To maximize performance, you can create files or filegroups on as many different available local physical disks as possible and place objects that compete heavily for space in different filegroups.
Use filegroups to enable the placement of objects on specific physical disks.
Place different tables used in the same join queries in different filegroups. This procedure will improve performance due to parallel disk input/output (I/O) searching for joined data.
Place heavily accessed tables and the non-clustered indexes belonging to those tables on different filegroups. This procedure will improve performance due to parallel I/O if the files are located on different physical disks.
Do not place the transaction log file(s) on the same physical disk with the other files and filegroups.

Transaction Logs

A database in SQL Server 2000 has at least one data file and one transaction log file. Data and transaction log information is never mixed on the same file, and individual files are used by only one database.

SQL Server uses the transaction log of each database to recover transactions. The transaction log is a serial record of all modifications that have occurred in the database as well as the transactions that performed the modifications. The transaction log records the start of each transaction and records the changes to the data. This log has enough information to undo the modifications (if necessary later) made during each transaction. For some large operations, such as CREATE INDEX, the transaction log instead records the fact that the operation took place. The log grows continuously as logged operations occur in the database.

The transaction log records the allocation and deallocation of pages and the commit or rollback of each transaction. This feature enables SQL Server to either apply (roll forward) or back out (roll back) each transaction in the following ways:

A transaction is rolled forward when you apply a transaction log. SQL Server copies the after image of every modification to the database or reruns statements such as CREATE INDEX. These actions are applied in the same sequence in which they originally occurred. At the end of this process, the database is in the same state that it was in at the time the transaction log was backed up.
A transaction is rolled back when you back out an incomplete transaction. SQL Server copies the before images of all modifications to the database since the BEGIN TRANSACTION. If it encounters transaction log records indicating that a CREATE INDEX was performed, it performs operations that logically reverse the statement. These before images and CREATE INDEX reversals are applied in the reverse of their original sequence.

At a checkpoint, SQL Server ensures that all transaction log records and database pages that were modified are written to disk. During each database’s recovery process, which occurs when SQL Server is restarted, a transaction needs to be rolled forward only if it is not known whether all the transaction’s data modifications were actually written from the SQL Server buffer cache to disk. Because a checkpoint forces all modified pages to disk, it represents the point at which the startup recovery must start rolling forward transactions. Because all pages modified before the checkpoint are guaranteed to be on disk, there is no need to roll forward anything done before the checkpoint.

Transaction log backups enable you to recover the database to a specific point in time (for example, prior to entering unwanted data) or to the point of failure. Transaction log backups should be a consideration in your media-recovery strategy.

Environment

Generally, the larger the database, the greater the hardware requirements. Database design should always take into consideration processor speeds, memory, and hard disk space and configuration. There are other determining factors, however: the number of concurrent users/sessions, transaction throughput, and the types of operations within the database. For example, a database containing infrequently updated school library data would generally have lower hardware requirements than a one-terabyte (TB) data warehouse containing frequently analyzed sales, product, and customer information for a large corporation. Aside from the disk storage requirements, more memory and faster processors would be needed for the data warehouse to cache more of the data in memory and to quickly process queries referencing large amounts of data.

Estimating the Size of a Database

When designing a database, you might need to estimate how big the database will be when it is filled with data. Estimating the size of the database can help you determine the hardware configuration that you will need to meet the following requirements:

Achieving the performance required by your applications
Ensuring the appropriate physical amount of disk space to store the data and indexes

Estimating the size of a database can also lead you to determine whether the database design needs refining. For example, you might determine that the estimated size of the database is too large to implement in your organization and that more normalization is required. Conversely, the estimated size might be smaller than expected, requiring you to denormalize the database to improve query performance.

To estimate the size of a database, estimate the size of each table individually and then add the values that you obtain. The size of a table depends on whether the table has indexes and, if so, what type of indexes. For more information about estimating the sizes of various types of tables, refer to SQL Server Books Online.

Physical Database Design

The I/O subsystem (storage engine) is a key component of any relational database. A successful database implementation usually requires careful planning in the early stages of your project. The storage engine of a relational database requires much of this planning, which includes the following:

What type of disk hardware to use
How to place your data onto the disks
Which index design to use to improve query performance when accessing data
How to set all configuration parameters appropriately for the database to perform well

SQL Server Installation

Although the installation of SQL Server is beyond the scope of this training kit, you should always take into consideration the following issues before performing an installation:

Be sure that the computer meets the system requirements for SQL Server 2000.
Back up your current installation of Microsoft SQL Server if you are installing SQL Server 2000 on the same computer.
If you are installing a failover cluster, disable NetBIOS on all private network cards before running SQL Server Setup.
Review all SQL Server installation options and be prepared to make the appropriate selections when running Setup.
If you are using an operating system that has regional settings other than English (United States), or if you are customizing character set or sort order settings, review topics on collation settings.

Before running SQL Server 2000 Setup, create one or more domain user accounts if you are installing SQL Server 2000 on a computer running Windows NT or Windows 2000 and want SQL Server 2000 to communicate with other clients and servers.

You should log on to the operating system under a user account that has local administrative permissions; otherwise, you should assign the appropriate permissions to the domain user account. Be sure to shut down all services dependent on SQL Server (including any service that uses ODBC, such as Internet Information Services, or IIS). In addition, shut down Windows NT Event Viewer and registry viewers (Regedit.exe or Regedt32.exe).

Security

A database must have a solid security system to control the activities that can be performed and to determine which information can be viewed and modified. A solid security system ensures the protection of data, regardless of how users gain access to the database.

Planning Security

A security plan identifies which users can see which data and perform which activities in the database. You should take the following steps to develop a security plan:

List all of the items and activities in the database that must be controlled through security.
Identify the individuals and groups in the company.
Cross-reference the two lists to identify which users can see which sets of data and perform which activities in the database.

Security Levels

A user passes through two stages of security when working in SQL Server: authentication and authorization (permissions validation). The authentication stage identifies the user who is using a login account and verifies only the capability to connect to an instance of SQL Server. If authentication is successful, the user connects to an instance of SQL Server. The user then needs permissions to access databases on the server, which is done by granting access to an account in each database (mapped to the user login). The permissions validation stage controls the activities that the user is allowed to perform in the SQL Server database.

Authentication Modes

SQL Server can operate in one of two security (authentication) modes: Windows Authentication and Mixed mode. Windows Authentication mode enables a user to connect through a Windows NT 4.0 or Windows 2000 user account. Mixed mode (Windows Authentication and SQL Server Authentication) enables users to connect to an instance of SQL Server by using either Windows Authentication or SQL Server Authentication. Users who connect through a Windows NT 4.0 or Windows 2000 user account can make use of trusted connections in either Windows Authentication mode or Mixed mode.

Lesson Summary

When planning a SQL Server database, you must take into consideration database files and filegroups, transaction logs, the SQL Server installation and its operating environment, and security. All data and objects in the database are stored within primary, secondary, and transaction log files. Filegroups group files together for administrative and data allocation/placement purposes. The filegroup that contains the primary file is the primary filegroup. A database in SQL Server 2000 has at least one data file and one transaction log file. Data and transaction log information is never mixed on the same file, and individual files are used by only one database. SQL Server uses the transaction log of each database to recover transactions. Database design should always take into consideration processor speeds, memory, and hard disk space and configuration. There are other determining factors, however: the number of concurrent users/sessions, transaction throughput, and the types of operations within the database. When designing a database, you might need to estimate how big the database will be when it is filled with data. When installing SQL Server, you should take into consideration several issues. A database must have a solid security system to control the activities that can be performed and to determine which information can be viewed and modified. A security plan identifies which users can see which data and perform which activities in the database.

Lesson 3: Identifying System Requirements

Before creating a database, you must have a thorough understanding of the job that the database is expected to do. You can gain this understanding only by identifying specific types of information that are essential to developing an efficient database design. This lesson discusses the types of information that you must gather before you can begin creating a logical data model. To develop this model, you must identify the goals of your database project. You must also understand the type and amount of data with which you will be working, how you will use the data, and any business rules that apply to the new system that you are implementing.

After this lesson, you will be able to:

Identify the goals and scope of a database development project.
Identify the types of data that a database will manage, the current amount of data, the expected growth, and how it is currently managed.
Identify how you will use the data in a new database.
Identify any business rules that will be placed on the system.

Estimated lesson time: 35 minutes

The Process of Identifying System Requirements

The process of identifying system requirements includes a variety of steps. The number of steps included in this process, how these steps are defined, and the detail in which they are defined can vary greatly from resource to resource (with no one method necessarily being the most accurate). For the purpose of this training kit, however, this process has been divided into four primary tasks:

Identifying system goals
Identifying the amount and types of data
Identifying how the data will be used
Identifying business rules

You do not necessarily have to perform these tasks one at a time. For example, while identifying the amount and types of data, you might also find it useful to determine how the data will be used and what constraints should be placed on the data. Figure 3.8 illustrates the process of identifying system requirements.

NOTE:
For additional information about designing relational database systems, refer to Designing Relational Databases by Rebecca M. Riordan (Microsoft Press, 1999). This book provides a thorough discussion of database design and can be a valuable resource in building on the information in this training kit.

Figure 3.8 Identifying system requirements. (Image unavailable)

Identifying System Goals

Designing a database requires an understanding of the business functions that you want to model. As much as possible, your database design should accurately model the business. It is time-consuming to change the design of a database significantly once it has been implemented. A well-designed database also performs better. When designing a database, you must consider the purpose of the database and how it affects the design. In other words, you must determine the goals of the new system. Why are you creating this database?

The system goals are the reasons why you are implementing the new database. To create an effective database design, you must have thorough knowledge of the job that the database is expected to perform. Without this understanding, you cannot make informed decisions about how the database should be designed. The system goals are the reasons why the database is being developed.

Determining the system goals is not always a straightforward process. Most database development projects have many goals (both tangible and intangible), and trying to discover them can often take a fair amount of detective work. For example, a manufacturing company might decide to automate its process for managing inventory. One of the company’s stated goals for the project is "to make it easer to manage inventory." Your job is to take this intangible goal and try to determine the underlying tangible goal(s). Does the company want to speed up the process of managing inventory? Does it want to more accurately track inventory? Does it want to reduce costs? The intangible goal of "making it easier" might include all of these more tangible goals and more.

Although these goals are more tangible, they are still vague. Vague goals tend to be stated in general terms, such as "increase productivity" or "improve performance." As you go through the process of identifying goals, you must determine the degree to which these goals should be achieved. If the goal is to increase productivity, you should try to find out from what to what. Whenever possible, your goals should be directly measurable.

You should be aware, however, of the dangers of going overboard when measuring certain goals. Often, in order to determine a measurable goal, you must be able to establish an initial measurement. For example, if a goal of an inventory database is to improve accuracy, it might take a great deal of resources to study how much inaccuracy exists in the current process. A study of this sort could span years of inventory history and perhaps cost more to conduct than it would to design and implement the database. In cases such as these, it might be better to talk to managers and bookkeepers first to get a general sense of where the problems lie and what can be done to solve them. When determining the extent to which a base measurement should be studied, you should keep in mind the scale of the project and its practical application while always maintaining a sense of proportion.

Sometimes, intangible goals can be difficult to translate into more tangible goals. This situation is particularly true for goals that adopt popular marketing jargon, such as product position, getting on the same page, or thinking out of the box. For example, the stated goal might be something that seems to have no meaning or relevance: "We want the new system to show our clients that we’re thinking out of the box and getting on the same page as them—in order to improve product positioning." In these cases, you must work closely with the organization to clearly define what its stated goal means.

After you have defined the initial goals for the new database, you can begin looking at the type and amount of data that the system will support. As you move forward with the database design process, however, be prepared to re-evaluate these goals. As projects move forward, management changes, business requirements change, and company expectations are revised. As a result, goals evolve—which means that the database design might need to be modified. You might also discover, as you dig deeper into the project, that some goals are unattainable or inappropriate. As new understandings and additional information continue to unfold, you must be prepared to act accordingly.

Identifying the Amount and Types of Data

The amount and types of data that your database will store can affect database performance and should be taken into consideration when creating your database. The amount of data will, of course, affect the size of your database, and the types of data are a factor in determining the kinds of constraints that are incorporated into the database design.

In many cases, determining the amount and types of data is a straightforward process because a system is already implemented and you are simply upgrading or replacing that system. In these situations, you can examine the body of data that already exists.

In those cases in which you are implementing a new system—or radically altering the existing one—your job might be a little more difficult because you might have to spend a fair amount of time determining what types of data will be stored and how much data there will be. You might need to interview key participants and collect copies of relevant documents and forms, such as customer statements, inventory lists, management reports, and any other documents that are currently being used.

Whatever the current system, you must determine the volume of data that the system will manage. When examining data volume, you should identify the actual amount of data and its growth pattern. For example, a warehouse might currently carry only a few thousand items, but it might be planning to add several hundred a day over the next few years to substantially increase how many items are kept on hand. Another warehouse might carry millions of items but it might plan to add only a few new items a day, never letting the inventory go much beyond its current capacity. The growth patterns for these two systems are very different, and as a result, the design approach will vary.

When looking at the types of data, you are basically trying to get a general sense of the categories of information that you will be storing and what details about the categories are necessary to store. This process will prepare you for mapping out the entities and attributes that you will incorporate into your database design. For example, if you are developing a database for a clothing retailer, the types of data might include information about customers who purchase products from the store. Customer information could include names, addresses, phone numbers, and even style preferences.

At this point in the design process, you do not have to get too specific about how data should be categorized or grouped. You are merely attempting to gain a general sense of the types of data involved and creating a centralized list for those types of data. When you actually begin to identify database objects, you will take the information you gather here and use it to develop a data model.

Identifying How the Data Will Be Used

As you gather system requirements, you must determine how information in your database will be used. The purpose of this step is to identify who will be using the data, the number of users who will be accessing the data, and the tasks they will be performing when they access that data.

When determining who will be using the data, you should think in terms of categories of users. For example, one category of users might be the general public (who accesses data through the Internet). You might also have another category of users who access data through the company’s intranet. Some organizations might have only one type of user, while other organizations might have many types. In addition, there is no set minimum or maximum number of users that each category must contain. The only limitations are those dictated by hardware configurations and database design. One category might contain only one user, while another category might contain 100,000 users.

As you determine who will be using the data, you must also identify how many users in each category will be accessing the data. This estimate should include not only current numbers but projected figures, as well. In addition, you will have to define user concurrency. You should know how many people will be connected to the database at one time and how many might be trying to update the database at one time.

Once you have defined who your users are and how many there are, you must identify the tasks they will be performing when they access the database. For example, suppose a manufacturing company includes a category of users who take orders from customers. This order-taking group must be able to access and update customer information. In addition, it must be able to view inventory data in order to place these orders. The company might also include a category of users from human resources. The human resources group must be able to view and update employee information. By identifying these tasks, you can determine how the database will be accessed and how data will be viewed and manipulated. When you combine this information with the number and type of users, you will be able to implement a database design that serves the needs of everyone in the organization.

Identifying Business Rules of the System

By identifying the business rules, you are determining the constraints that govern how data and the system should be handled and protected. These constraints refer to more than the individual integrity applied to entity attributes. Business rules are much broader and incorporate all of the constraints on the system, including data integrity as well as system security. In other words, you are defining what each category of users can or cannot do.

Returning to the example of the manufacturing company, the order-taking group can access and update customer records and view inventory data. You might determine that these users should not be able to update inventory data and should not be able to view employee data, however. You might also determine that no customer records can be created without a full mailing address and phone number. Another constraint might be that any item added to a customer order should be removed from inventory. Business rules can include a wide spectrum of constraints, some pertaining to the system as a whole and others pertaining to specific types of data.

Exercise 2: Identifying the System Requirements for Your Database Design

In this exercise, you will review the following scenario. From the information in the scenario, you will identify the system requirements for a database design. You will be using this scenario and the result of this exercise in subsequent exercises. The end product will be a database that you have designed and implemented on your SQL Server computer. To complete this exercise, you need paper and a pencil. Because you need to save the result of this exercise, however, you might want to copy it into a word processing file or text file.

NOTE:
When designing a relational database system, your design specifications often include the applications that are necessary to access the data. For the purposes of this training kit, however, the exercises will focus on designing and implementing only the database component of the entire system.

Book Shop Scenario

The manager of a small book shop has asked you to design and implement a database that centralizes information so that it is easier and more efficient to manage inventory and track orders and sales. The shop handles rare and out-of-print books and tends to carry only a few thousand titles at any one time. Currently, the manager tracks all of the sales and inventory on paper. For each book, the manager records the title, author, publisher, publication date, edition, cost, suggested retail price, and a rating that indicates the condition of the book. Each book is assigned one of the following ratings: superb, excellent, good, fair, poor, or damaged. The manager would like to be able to add a description to each rating (just a couple of sentences), but the description should not be required. The information about each book must include the title, author, cost, suggested retail price, and rating. The publisher, publication date, and edition are not always available. If the year a book was published is available, the year will never be before 1600. And for purposes of the new database system, the publication date will never fall after the year 2099.

Because these books are rare, each title must be tracked individually—even if they are the same book (identical title, author, publisher, publication date, and edition). Currently, the manager assigns a unique ID to each book so that identical titles can be differentiated. This ID must be included with the book information. The book ID assigned by the manager is an eight-character ID made up of numbers and letters.

The manager also maintains limited information about each author whose books the store has carried or is carrying. The store might carry more than one book by an author, and sometimes more than one author will have written a book. The manager currently maintains information about approximately 2500 authors. The information includes the author’s first name, last name, year of birth, and year of death (if applicable). The information must include—at the very least—the author’s last name. The manager would like to include a brief description of each author, if available, when the author is added to the list. The description will usually be no longer than one or two sen tences.

The bookstore has 12 employees (including the manager and assistant manager). The manager expects to hire an additional employee every year for the next few years. Both the manager and the assistant manager must be able to access and modify information about each employee as necessary. Employee information must include each employee’s first name, last name, address, phone number, date of birth, hire date, and position in the store. Positions include Manager, Assistant Manager, Full Time Sales Clerk, and Part Time Sales Clerk. The manager might at some point want to add new job titles to the list or change existing ones and would eventually like to add a brief description of job duties to each title (at least, to some of the titles). An employee can hold only one position at any one time. No employee—other than the two managers—should have access to the employee information. The manager also likes to track how many books and which books each employee is selling.

The bookstore currently maintains information about customers. For each customer, the information includes the customer’s first name, last name, telephone number, mailing address, books that the customer has purchased, and when the purchase was made. Because some customers do not like to give out personal information, only a first name or a last name is required. The manager currently has a list of about 2000 customers. Not all customers who are included in the list have bought books, although most have.

The manager maintains a record of sales by tracking each order from when a sales clerk takes the order to when the sale is complete. In some cases, such as for walk-in customers, these two events occur concurrently. Each order must include information about the book sold, the customer who bought the book, the salesperson who sold the book, the amount of the sale, and the date of the order. The order must also include the delivery or pickup date, which is added after the merchandise is actually picked up or delivered. An order is completed when a book has been paid for and picked up at the store or paid for and shipped to the customer. A book cannot be taken out of the store or shipped unless it is paid for. Each order includes the payment method and the status of the order. Payment methods include cash, check, and credit cards. The status of an order must be one of the following: (1) to be shipped, (2) customer will pick up, (3) shipped, or (4) picked up. An order can contain only one customer, salesperson, order date, delivery date, payment method, and order status; however, an order can contain one or more books.

Currently, orders are generated, tracked, and modified on paper order forms. The forms are used to make sure that the orders get shipped (if applicable) and to maintain a record of sales. Whenever a book is added to an order, it is removed from the inventory list. This process has been very tedious and not always very efficient. This situation can also lead to confusion and mistakes. Ideally, the manager would like sold books to remain in the list of books but be marked somehow to show that the book has been sold.

The store sells about 20 books a day. The store is open five days a week for about 10 hours a day. There are one to two salespeople working at the same time, and there are two sales counters where people pick up and pay for books and where salespeople process orders. At least one manager is in the store at one time.

The manager expects sales to increase by about 10 percent each year. As a result, the number of books on hand, authors, and customers should all increase at about the same rate.

In order to serve customers effectively, each employee must be able to access a centralized source of information about authors, books in stock, customers, and orders. Currently, employees access this information from index cards and lists. Often, these lists are not up-to-date, and errors are made. In addition, each employee should be able to create, track, and modify orders online, rather than having to maintain paper order forms. Only the managers should be able to modify information about authors, books, and customers, however.

NOTE:
You can find a copy of this scenario in the BookstoreProject.doc file in the Chapter03\Exercise2 subfolder of the Training Kit Supplemental CD-ROM. You can use this copy to mark up and make notes as necessary to complete the steps in this exercise. This document will also come in handy for subsequent exercises.

To identify system goals

Review the scenario. Do not try to memorize all the details; instead, try to get a general sense of what the project is trying to accomplish.
Write down the system goals that you can identify in the scenario.
What are those goals?
Review each goal to determine whether it is measurable.
Which goals are measurable?

To identify the amount and type of data

Write down the categories of data that you can identify in this scenario.
What categories of data can you identify?
For each category of data that you identified in Step 1, write down the type of information that you should track for each category.
What types of information can you identify?
For each category of data that you identified in Step 1, write down the current amount of data for each category.
What is the volume of data for each category?
For each category of data that you identified in Step 1, write down the expected growth pattern.
What is the growth pattern for each category?

To identify how the data will be used

Write down the categories of users that you can identify in this scenario.
What are those categories of users?
For each category of user that you identified in Step 1, write down the number of users.
What are the current number of users and the projected number of users in each category?
For each category of user that you identified in Step 1, write down the tasks that they will be performing.
What tasks will each type of user be performing?

To identify business rules

Write down the business rules that you can identify in this scenario.
What are the business rules?

Lesson Summary

Before you can develop a data model, you must identify the goals of your database project, the type and amount of data that you will be working with, how the data will be used, and any business constraints that should exist on the data. You must consider the purpose of the database and how it affects the design. You should have a clear understanding of why you are creating this database. Another area of concern when identifying system requirements is the amount and types of data that your database will store. Whatever the current system, you must determine the volume of data that the system will manage. When examining data volume, you should determine the actual amount of data and its growth pattern. When looking at the types of data, you are basically trying to get a general sense of the categories of information you will be storing and what details about the categories are necessary to store. As you gather system requirements, you must identify who will be using the data, the number of users who will be accessing the data, and the tasks they will be performing when they access that data. By identifying the constraints on the data, you are determining the business rules that govern how data should be handled and protected. Business rules include data integrity as well as system security. They enable you to define what each category of users can and cannot do.

Lesson 4: Developing a Logical Data Model

Once you have identified the system requirements, you are ready to develop a logical data model. The data model is essentially an extension of the system requirements. When creating the data model, you are organizing the requirements into a logical representation of the database. The data model includes definitions of entities, their attributes, and entity constraints. The model also includes definitions of the relationships between entities and the constraints on those relationships. This lesson describes how to develop a data model by identifying the entities, their attributes and constraints, and their relationships.

After this lesson, you will be able to:

Identify entities and their attributes.
Identify relationships between entities.
Define constraints on data.

Estimated lesson time: 35 minutes

Identifying Entities and Their Attributes

When you gather system requirements for a database design, one of the steps that you take is to define the types of data that the database will contain. These types of data can be separated into categories that represent a logical division of information. In most instances, each category translates to a table object within the database. Normally, there is a set of primary objects, and after they are identified, the related objects become more apparent.

For example, in the Pubs database, one of the primary objects is the Titles table. One of the objects related to the Titles table is the RoySched table, which provides information about the royalty schedules associated with each book. Another object is the TitleAuthor table, which matches authors to books.

By using the categories of data defined in the system requirements, you can start to create a map of the table objects within your new database. For example, suppose you are designing a database for a hotel’s reservation system. During the process of gathering system requirements, you identify several categories of data, including rooms, guests, and reservations. As a result, you add tables to your database design that match each of these categories, as shown in Figure 3.9.

Figure 3.9 The primary objects in a database design: the Rooms table, the Reservations table, and the Guests table. (Image unavailable)

When identifying the business rules for this system, you determined that the hotel has eight types of rooms and that regular guests prefer a certain type of room. As a result, the Rooms table and the Guests table will each include a room type attribute. You decide to create a table for room types, as shown in Figure 3.10.

Figure 3.10 The hotel's reservation database, which includes the Room Type table. (Image unavailable)

Now, the Rooms table and the Guests table can reference the RoomType table without having to repeat a room description for each room and each guest. In addition, as room types change, you can update the information in one location, rather than having to update multiple tables and records.

Before you can complete the process of defining table objects within the database, you must define the relationships between the tables. Whenever you identify a many-to-many relationship, you will have to add a junction table. Relationships are discussed in more detail later in this lesson.

After you have defined all of the tables that you can define at this point, you can define the columns (attributes) for those tables. Again, you will be taking this information directly from the system requirements in which you identified which types of data should be included with each category of information.

Using the earlier hotel database example, suppose that you determined during the process of gathering system requirements that the Guests category of data should include information about the guests’ first names, last names, addresses, telephone numbers, and room preferences. As a result, you plan to add columns to the Guests table for each of these types of information. You also plan to add a unique identifier for each guest, as is the case with any normalized entity. Figure 3.11 shows the Guests table with all of the columns that the table will contain.

Figure 3.11 The Guests table and its attributes. (Image unavailable)

Identifying Relationships Between Entities

After you have defined the tables and their columns, you should define the relationships between the tables. Through this process, you might discover that you need to modify the design that you have created to this point.

Start by choosing one of the primary tables and selecting the entities that have relationships to that table. Referring once more to the hotel database used in earlier examples, assume that the system requirements state that all reservations must include room and guest information. Rooms, guests, and reservations are the categories of data. As a result, you can deduce that a relationship exists between rooms and reservations and between guests and reservations. Figure 3.12 shows the relationships between these objects. A line connecting the two tables signifies a relationship. Notice that a relationship also exists between the Rooms table and the RoomType table and between the Guests table and the RoomType table.

Figure 3.12 The relationships that exist between tables in the hotel's reservation database. (Image unavailable)

Once you establish that a relationship exists between tables, you must define the type of relationship. In Figure 3.12, each relationship (line) is marked at each end (where it connects to the table) with the number 1 or with an infinity symbol. The 1 refers to the one side of a relationship, and the infinity symbol refers to the many side of a relationship.

NOTE:
Different sources use different types of notation to signify the types of relationships that exist between tables. For example, Database Designer in SQL Server uses a key symbol to mark the one side of a relationship and uses an infinity symbol to mark the many side of the relationship.

To determine the types of relationships that exist between tables, you should look at the types of data that each table contains and the types of interchange between them. For example, a relationship exists between the Guests table and the Reservations table. The relationship exists because guests must be included in reservation information. According to the business rules, a guest can make one or more reservations, but each reservation record can include the name of only one guest, usually the person who is making the reservation. As a result, a one-to-many relationship exists between the two tables: one guest to many reservations.

A relationship also exists between the Reservations table and the Rooms table. According to the business rules, a reservation can be made for one or more rooms, and a room can be included in one or more reservations (on different dates). In this case, a many-to-many relationship exists: many reservations to many rooms. In a normalized database design, however, many-to-many relationships must be modified by adding a junction table and creating one-to-many relationships between each original table and the junction table, as shown in Figure 3.13.

Figure 3.13 The RoomReserv table as a junction table between the Rooms table and the Reservations table. (Image unavailable)

Identifying Constraints on Data

At this point in the database design process, you should have the entities, their attributes, and the relationships between entities mapped. Now, you must identify the constraints on the data that will be stored in your tables. Most of your work was already completed when you identified the business rules as you gathered system requirements. As stated previously, business rules include all constraints on a system, including data integrity and security. For this stage of the design process, your focus will be on the constraints specific to the data. You will take the data-related business rules and refine and organize them. You should try to organize the constraints based on the objects that you created in the database, and you should word them in a way that reflects those objects.

Returning again to the database design in Figure 3.13, suppose that one of the business rules is stated as follows: "A guest record can, but is not required to, include one of the predefined room type preferences but cannot include any other room type preference." When defining the data constraints, you should reference the relevant tables and columns and separate them so that they each focus on a single instruction:

The RoomTypeID column in the Guests table does not require a value.
A value other than NULL entered in the RoomTypeID column in the Guests table must be a value included in the RoomTypeID column in the RoomType table.
A row in the Guests table can include only one value in the RoomTypeID column.

When possible, you should organize data constraints according to tables and their columns. In some cases, a constraint applies to the table as a whole, to more than one table, to a relationship between tables, or to data security. In these cases, try to organize the constraints in a way that is the most logical and the most relevant to the project you are working on. The goal of identifying the data constraints is to have a clear road map when creating database objects and their relationships and enforcing data integrity.

Exercise 3: Developing a Logical Data Model

In this exercise, you will take the steps necessary to create a logical data model. Much of this exercise involves drawing the tables, entities, and relationships that make up the database. Although you can use a drawing program such as Visio to create these objects, paper and a pencil are all that you really need. If you like, you can later transfer your model to a drawing program. In addition, you will need paper and a pencil to write the data constraints. You can also write these directly to a word processing document or a text document. Whatever method you choose, you should save the result for subsequent exercises. To perform this exercise, you will use the book shop scenario from Exercise 2 in Lesson 3.

To identify which tables to add to a database

Refer to the system requirements that you developed for the book shop scenario and write down the categories of data.
Each category represents one of the primary table objects in your database design.
Draw a table for each category of data. The tables should be large enough so that you can add column names. Place the tables in a way that enables you to draw relationships between the tables. You will be adding column names and defining relationships later in this exercise.
Your drawing should include five tables.
Label each table with the name of one of the categories. For consistency, use the following labels for the table names: Books, Authors, Employees, Customers, and Orders.
Your next step will be to identify any related tables. At this point, designing a database becomes a little more complicated. A good source to use for determining related tables is the list of business rules that you identified when you gathered the system requirements. Essentially, you are looking for subcategories of information or business rules that lead you to believe that additional tables are necessary. Remember, you can modify the database design as you identify relationships between tables and constraints on data.
Refer to the business rules in the system requirements. Notice that there are four subcategories of information: the condition of a book, the employee positions, the form of payment, and the order status.
Draw the four related tables to support the primary tables.
For consistency, use the following names for your new tables: OrderStatus, FormOfPayment, Positions, and BookCondition.
Refer to the business rules in the system requirements. Notice that an order can contain more than one book.
Add one more table (BookOrders) that tracks the books ordered and the actual orders taken from customers.
You should now have 10 tables.

To identify which columns to add to the tables

Refer to the system requirements that you developed for the book shop scenario.
For each category of data, you defined which information should be included with each category. This information makes up your columns.
Add column names to each table. Also remember that each row in a table must be uniquely identifiable, so the table might need an identifier. In addition, where column names are referring to information in a related table, you usually just need the identifier column from the related table. For example, the Orders table would include a StatusID column that references the OrderStatus table.

For consistency, use the following labels for column names:

Table	Columns
Books	TitleID, AuthorID, Publisher, PubDate, Edition, Cost, SRP, ConditionID, Sold
BookCondition	ConditionID, ConditionName, Description
Authors	AuthorID, FirstName, LastName, YearBorn, YearDied, Description
Employees	EmployeeID, FirstName, LastName, Address1, Address2, City, State, Zip, Phone, DOB, HireDate, PositionID
Positions	PositionID, Title, JobDescrip
Customers	CustomerID, FirstName, LastName, Phone, Address1, Address2, City, State, Zip
Orders	OrderID, CustomerID, EmployeeID, Amount, OrderDate, DeliveryDate, PaymentID, StatusID
OrderStatus	StatusID, StatusDescrip
FormOfPayment	PaymentID, PaymentDescrip
BookOrders	OrderID, BookID

Notice that the Employees table does not include a column for books purchased and dates of purchases. Because each customer can purchase more than one book, you would not include the information here. You could create a table to store this information, but it would be unnecessary because it would duplicate information that already exists in a database (information that can be derived through views or ad hoc queries).

To identify relationships between entities

Determine what relationships exist between the Books table and other tables in the database. If necessary, refer to the book shop scenario and to the system requirements to help determine what relationships exist between objects.
You are looking for direct relationships. For example, the Books table has a direct relationship with the BookCondition table. BookCondition data applies directly to Books data. In addition, Authors data is directly related to Book data (authors write books). There is also a direct relationship between Books data and BookOrders data (orders include the books being sold).

Notice that there is no direct relationship between the Books table and the Orders table. The relationship between the two tables is indirect and is expressed through the BookOrders table.
For each table, draw a line from that table to any other table with which a relationship exists. You might find that you need to reposition some of your tables in order to more clearly show those relationships.
Your database design should look similar to the schema in Figure 3.14.

Figure 3.14 Identifying the relationships between tables in the logical data model. (Image unavailable)
Determine whether each relationship is one-to-one, one-to-many, or many-to-many. Write the number 1 at the one end of the relationship, and write the infinity symbol at the many end of the relationship.
To determine the type of relationship, think in terms of the data associated with each object. For example, a relationship exists between employees and the orders that they generate. An employee can create many orders, but only one employee can create an order. Therefore, a one-to-many relationship exists between the Orders table and the Employees table (one employee can create many orders). The Employees table is on the one side of the relationship, and the Orders table is on the many side.

Your database should now look similar to the schema in Figure 3.15.

Figure 3.15 Identifying the types of relationships between tables in the logical data model. (Image unavailable)
Identify any many-to-many relationships in the database design.
Which relationship is many-to-many?
Create a junction table named BookAuthors. The table should include the AuthorID column and the TitleID column.
Delete the relationship between the Books table and the Authors table, then delete the AuthorID column in the Books table.
You are deleting the relationship between the two tables because a direct relationship no longer exists. Instead, an indirect relationship is created through the BookAuthors table. In addition, the AuthorID column is no longer necessary in the Books table because the book/author relationship is expressed in the BookAuthors table.
Draw the relationship between the Authors and BookAuthors tables and the relationship between the Books and BookAuthors tables.
Determine the types of relationships that exist with the BookAuthors table.
Your database design should now look similar to the schema in Figure 3.16.

Figure 3.16 Adding the BookAuthors table to the logical data model. (Image unavailable)

To identify constraints on data

On a piece of paper, write down the names of each table in your database design. Leave plenty of space between each table name to write the data constraints.
Review the business rule stating that book information must include the title, author, cost, suggested retail price, rating, and unique ID.
Identify the object, if any, to which this business rule applies.
To which object(s) does this business rule apply?
Under the Books table name and the BookAuthors table name, write the data constraints that you can derive from the business rule.
What are the data constraints?
For each business rule, define the data constraints. Where applicable, write the constraints beneath the table name. If a constraint does not apply specifically to one table, write it in another space on your paper.
What are the data constraints for your database design?
Review the data constraints that you just created to ensure that every table and every column within those tables has some sort of rule associated with it.

Lesson Summary

The data model includes definitions of entities, their attributes, and entity constraints. The model also includes definitions of the relationships between entities and the constraints on those relationships. One of the first steps that you must take toward creating a data model is to identify the types of data that the database will contain. These types of data can be separated into categories that represent a logical division of information. In most instances, each category translates to a table object within the database. Once you have defined the tables and their columns, you should define the relationship between the tables. To determine the type of relationship that exists between tables, you should look at the types of data that each table contains and the types of interchanges between them. Once you determine the relationships between tables, you must identify the constraints on the data that will be stored in your tables. You should organize data constraints according to tables and their columns (when possible).

Review

The following questions are intended to reinforce key information presented in this chapter. If you are unable to answer a question, review the appropriate lesson and then try the question again. You can find answers to the questions in Appendix A, "Questions and Answers."

What does a SQL Server database consist of?
What is normalization?
What are the four basic rules that you should follow when normalizing a database design?
What are the three basic types of relationships that can exist between tables in the SQL Server database, and what are the basic differences between these types?
What are the three types of operating system files that SQL Server uses?
What two stages of security does a user pass through when working in SQL Server, and how do these stages differ?
What are the four primary tasks that you should perform when identifying the system requirements for a database design?
When determining the volume of data that the system will manage, you should identify what two types of information?
When determining how data will be used in a new system, you should identify what three types of information?
When you are gathering system requirements for a database design, one of the steps that you should take is to define the specific categories of data. What type of object within a database maps to these categories of information?
What task do you perform after determining that a relationship exists between two tables, and how do you perform that task?
What information within the system requirements should you base data constraints upon?

Read More Show Less

Overview

Customers Who Bought This Also Bought

More About This Book

Overview

Editorial Reviews

Booknews

Product Details

Meet the Author

Read an Excerpt

Chapter 3: Designing a SQL Server Database

Lesson 1: Introduction to Database Design

Components of a SQL Server Database

Normalizing a Database Design

Entity Relationships

Lesson Summary

Table of Contents

First Chapter

Customer Reviews

5 Star

4 Star

3 Star

2 Star

1 Star

MCSE: Microsoft SQL Server 2000 Database Design and Implementation Training Kit

Overview

Customers Who Bought This Also Bought

More About This Book

Overview

Editorial Reviews

Booknews

Product Details

Related Subjects

Meet the Author

Read an Excerpt

Chapter 3: Designing a SQL Server Database

Lesson 1: Introduction to Database Design

Components of a SQL Server Database

Normalizing a Database Design

Entity Relationships

Lesson Summary

Table of Contents

First Chapter

Customer Reviews

5 Star

4 Star

3 Star

2 Star

1 Star

Good Review Material