Want to read this on your NOOK? Request as NOOK Book from the publisher

Thank you for requesting this book as a NOOK book from the publisher.

More About This Book

Overview
Editorial Reviews
Product Details
Related Subjects
Table of Contents
First Chapter

Overview

Microsoft SQL Server 6.5 Programming Unleashed will guide the reader in developing more effective and efficient programs and applications. Coverage of SQL Server programming methods in a large, enterprise environment is also included. Some of the valuable topics that will be covered are: Scalability issues and Enterprise Environment coverage Writing Effective Code Data Modeling Tools Visual InterDev Transact-SQL DeBugger Advanced coding for ODBC and other connectivity tools Migration to SQL Server from another database Create modules and business objects for a fictitious business Programming solutions for Data Warehouse, Internet and Intranet applications Develop Distributed Applications Front-end development tools/languages such as Visual Basic, Visual C++, Delphi and PowerBuilder

Microsoft SQL Server 6.5 is a scalable, high performance database management system for Windows NT-based systems. It is the fastest database for Windows NT!
The Unleashed treatment provides comprehensive coverage aimed at advanced SQL programmer and developer
Covers programming techniques for developing data warehouse, Internet and intranet applications

Read More Show Less

Editorial Reviews

Booknews

Provides SQL Server developers and programmers with solutions to increase efficiency and improve the performance of databases and applications. Gives detailed information on how to write effective code, develop Internet/Intranet applications with ADO, RDO and IDC; program and optimize alerts and queries; manipulate onscreen objects in Visual InderDev; connect to SQL Server using ODBC, ADO, OLE-DB, RDS, IDC, and DB-Library; learn programming tips for Line-of-Business, Data Warehouse, and Web-based applications; develop applications for distribution, replication, and the Client/Server environment; and build an administrative tool to employ system and extended stored procedures. Annotation c. by Book News, Inc., Portland, Or.

Read More Show Less

Product Details

ISBN-13: 9780672312441
Publisher: Sams
Publication date: 4/15/1998
Series: Unleashed Series
Pages: 912
Product dimensions: 7.25 (w) x 9.02 (h) x 2.24 (d)

Pt. I	Advanced Programming Techniques
1	Beyond the Basics of Data Manipulation Language	3
2	Using Advanced Data Manipulation Language	31
3	Optimizing Queries	56
4	Advanced Transact-SQL Statements	83
5	Effective Use of Built-in Functions	109
6	Effective Use of Stored Procedures as an Administrative Tool	135
7	Employing Cursors	155
8	Advanced String Manipulation and Bitwise Operators	177
9	Dynamic Execution	195
10	Security Issues	218
11	Specialized Triggers	231
12	Updating Table Indexes and Statistics	257
13	Crosstabulation	275
Pt. II	SQL Server - Essential Information
14	Writing Effective Code	305
15	Taking Advantage of the Tools	327
16	Using Multiple Tiers and Client/Server Architecture	359
17	Replication	373
Pt. III	Migrating to SQL Server: Data Conversion and Integration
18	Legacy Databases: Conversion and Integration Issues	395
19	Outgrowing Access	405
20	The Oracle Transition	433
21	Making the Switch from Sybase SQL Server	461
Pt. IV	Programming Internal and External Connections
22	Using the DBLibrary	481
23	Using ODBC and Visual C++ with SQL Server	511
24	Connecting to SQL Server from Visual Basic	571
25	SQL Server on Internet/Intranet Systems	618
Pt. V	Programming Real-World Implementations
26	Integrating SQL Server with Other Microsoft Products	675
27	Programming a Line-of-Business Solution	701
28	Data Warehousing and Online Transaction Processing (OLTP)	711
Pt. VI	Appendixes
A	The System Tables of the Master Database	759
B	The System Tables of All Databases	775
C	System Stored and Extended Stored Procedures Quick Reference	797
D	Common Error Codes	811
	Index	827

Read More Show Less

First Chapter

[Figures are not included in this sample chapter]

Microsoft SQL Server 6.5 Programming Unleashed
- 3 -
Optimizing Queries

Indexes can make or break (or even brake) the performance of SQL Server if implemented without much thought. Most programmers know to put an index on the primary key field(s) of a table and even the foreign key field(s) of related tables. But what type of index should you use? Do you need to put indexes on other fields within a particular table? When should you use a composite index?

Locks and transaction isolation levels are another necessary part of optimizing performance and selecting the types of queries to use. For example, you might eventually need to know whether using read uncommitted isolation levels is better than using read committed isolation levels. A better understanding of locks and isolation levels will help you, the programmer, make better choices about indexes and table structure.

We will answer these and other questions in the following section. But to understand when to use indexes, you must understand index structures themselves.

A Closer Look at Index Structures

SQL Server provides for two types of indexes: clustered and non-clustered. Although the role of any index is to speed data retrieval and updates, each type of index is tuned to do certain jobs more effectively than the other. In some cases either type of index will provide similarly quick data retrieval, but the performance from those same indexes may differ greatly when there's a minor change to the query.

Non-Clustered Indexes

Most people are familiar with the concept of non-clustered indexes, although they might not be aware of it. This type of index is similar to an index of a book or labels on a file cabinet's drawers. You use the index to find out where an item is located, and then you use the information presented in the index to go to that item (be it a page or a drawer--or a row of data).

Examine Figure 3.1 to see a better picture of a non-clustered index.

This figure is abridged for readability. It shows the index root with its first two entries: Alvin and Marti. The index root is a page of data that contains evenly dispersed entries from the entire set of data from the indexed column. The number of entries that will fit in this initial page will depend upon the size of the column being indexed.

Next, each entry points to another page in the index node level. The first page of data will contain evenly dispersed entries from the first item in the index root to the second item in the index root. Depending on how many rows are in the table being indexed, the number of levels in the index nodes will vary. This figure shows just one level, but larger tables with larger indexed columns could increase the number of levels of index nodes.

Figure 3.1. The structure of a non-clustered index.

Imagine you were trying to find an order number that ranged from 1 to 1048576. If each index page could only hold four entries, the index root would contain 1, 262144, 524288, and 786432. The first page in the first index node level would contain 1, 65536, 131072, and 196608. The first page in the next index node level would contain 1, 16384, 32768, and 49152. The first page in the next index node level would contain 1, 4096, 8192, and 12288. This would continue until the eighth level, known as the leaf level, where the first page would contain 1, 2, 3 and 4; the second page would contain 5, 6, 7 and 8; and so on.

This leaf level points to the actual data (in the data level) being sought, and is not necessarily in any particular order. Hence the crossing of the lines in Figure 3.1. The name "Marti" was entered before the name "Alvin," and thus comes before that entry in the table's data pages. Information in a book (such as this one) is not presented in alphabetical order, but rather by topic. The last names in a customer table are not necessarily in alphabetical order, but rather ordered based on when they were entered into the table.

Clustered Indexes

Clustered indexes are a little different than their non-clustered counterparts. Whereas a textbook uses the equivalent of a non-clustered index, a dictionary or encyclopedia uses a mechanism like that of a clustered index. Encyclopedias are often separated into volumes based on the letter of the topic you want to investigate (index nodes). Then, the topics are listed alphabetically and the pages are marked to reveal the topic that starts on that page and the one that ends on that page (leaf level). However, the data itself is already in order. There is no need to find the topic in the index and then look it up based on some page number reference.

This is how a clustered index functions. The data itself is actually reorganized to match with the index nodes and, therefore, is combined with the leaf level. Examine Figure 3.2 for a better view of this phenomenon.

Figure 3.2. The structure of a clustered index.

This structure has the potential to both increase and decrease performance when manipulating data. You will read more about this in the coming sections.

Because the data and leaf levels are now one, clustered indexes require less space than an equivalent non-clustered index. And because the data is reorganized with the clustered index, you can only create one clustered index per table, as opposed to non-clustered indexes, of which there may be up to 254.

Clustered Versus Non-Clustered Indexes

So when should you use a clustered index and when should you use a non-clustered index? Let's look at some examples and determine which index would be a better choice. ORDER BY and GROUP BY Clauses If you are often selecting data and using the ORDER BY and/or the GROUP BY clause, either type of index will help with SELECT performance. If you typically select customers and sort by last and first name, either index will provide a quick means of retrieving that data. Some of the following factors may sway you to use one type or the other, however. Returning a Range of Values For example, if you are returning all names between `Smith' and `Talbert', or orders for dates between '11/1/97' and '11/30/97', and you do this sort of thing often, you are better off using a clustered index on the particular column on which the range is based. Because clustered indexes already contain the data in a sorted order, it is more efficient for retrieving data that is in a particular range. It only needs to find the start and end of the data to retrieve all of it, unlike a non-clustered index, which needs to look up each entry from the leaf level in the data level.

One or Few Unique Values in Columns

Some columns in your table will contain few, if any, unique values. An example is a status column that contains only the values `Inactive', `Active', or `Terminated'. In such a case, it is not wise to use any type of index on that column. The justification is simple: If you have a table with 15,000 rows, approximately 1/3 (or 5,000) of the rows will contain `Active' in the status column. It is just as efficient, if not more so, to scan the entire table than to look up each entry in the index pages (the index root and node) and then find the actual data page on which each row with an `Active' status resides. Listing 3.1 is an example script that will create a simple table with few unique values and an index on the column containing those highly duplicated values. This script can be found in the file SCR0301.SQL from Sams' Web site for this book. Don't be alarmed when this script takes a few minutes to run (you will be inserting 15,000 rows).

Listing 3.1. Creating a table with few unique values.


CREATE TABLE FewUniques 
( 
         Id        int       IDENTITY(1,1) NOT NULL, 
         Status    char(10)  NULL 
) 
GO 
SET IDENTITY_INSERT FewUniques ON 
DECLARE   @intCounter int 
BEGIN TRAN 
SELECT    @intCounter = 1 
WHILE     @intCounter <= 15000 
BEGIN 
         INSERT FewUniques (Id, Status) VALUES (@intCOunter, `Active') 
         SELECT @intCounter = @intCounter + 3 
END 
SELECT    @intCounter = 2 
WHILE     @intCounter <= 15000 
BEGIN 
         INSERT FewUniques (Id, Status) VALUES (@intCOunter, `Inactive') 
         SELECT @intCounter = @intCounter + 3 
END 
SELECT    @intCounter = 3 
WHILE     @intCounter <= 15000 
BEGIN 
         INSERT FewUniques (Id, Status) VALUES (@intCOunter, `Terminated') 
         SELECT @intCounter = @intCounter + 3 
END 
COMMIT TRAN 
SET IDENTITY_INSERT FewUniques OFF 
GO 
DUMP TRANSACTION pubs WITH NO_LOG, TRUNCATE_ONLY 
GO 
CREATE INDEX inFewUniquesStatus 
ON FewUniques (Status) 
GO

Next, you can run the two SELECT statements shown in Listing 3.2. Be sure to turn on the Query options "Show Query Plan" and "Show Stats I/O." The results will amaze you.

Listing 3.2. Selecting table data with and without an index.


--Force the Query Optimizer to use a table scan 
SELECT    * 
FROM      FewUniques (index=0) 
WHERE     Status = `Inactive' 
--Force a particular index to be used by the Query Optimizer 
SELECT    * 
FROM      FewUniques (index=inFewUniquesStatus) 
WHERE     Status = `Inactive'

You may notice the extra code in parentheses after the table name in each query. This feature (known as an optimizer hint) will be discussed in the "Using Index Optimizer Hints" section later in this chapter. The Stats I/O information shows two very different results (see Listing 3.3). The first SELECT statement forced a table scan and only needed to perform 157 reads from memory (all the data was in memory because it was just inserted, so no disk or physical reads needed to be done). The second SELECT statement required 5,053 reads.

Listing 3.3. The Stats I/O results from Listing 3.2.


--Stats I/O from table scan access (comment was added) 
Table: FewUniques  scan count 1,  logical reads: 157,  physical reads: 0, Â 
 read ahead reads: 0 
--Stats I/O from indexed access (comment was added) 
Table: FewUniques  scan count 1,  logical reads: 5053,  physical reads: 0, Â 
 read ahead reads: 0

The statistics page, the index root, the index nodes, and the data pages all must be read when selecting data via an index. This normally will decrease the number of reads required, but when the number of records being returned is high, more reads are required.

Low Number of Unique Values

What if the number of unique values increases? And what if the table is larger? As the number of rows in of the table and the number of unique values in a column grow, the index becomes more beneficial. Examine and run the code in Listing 3.4.

Listing 3.4. Creating a table with a low number of unique values.


DROP TABLE FewUniques 
GO 
CREATE TABLE FewUniques 
( 
         Id        int       IDENTITY(1,1) NOT NULL, 
         Status    char(10)  NULL, 
         Col3      char(20)  NOT NULL, 
         Col4      char(50)  NOT NULL 
) 
GO 
DECLARE   @intNum int 
SELECT    @intNum = 0 
BEGIN TRAN 
WHILE     @intNum <= 1300 
BEGIN 
   INSERT FewUniques VALUES (CHAR(@intNum % 26 + 65), `test3', `test4') 
   SELECT @intNum = @intNum + 1 
END 
COMMIT TRAN 
GO 
CREATE INDEX inFewUniquesStatus 
ON FewUniques (Status) 
GO

This script will create, drop, and re-create the FewUniques table (with some modifications to increase the row size). It then inserts data into rows so that the status column will contain `A', `B', ..., `Z', `A', and so on. Next, it creates an index (once again) on the status column. When the two SELECT statements in Listing 3.5 are executed, the results are quite different (see Listing 3.6).

Listing 3.5. Selecting data with and without an index.


--Force the Query Optimizer to use a table scan 
SELECT    * 
FROM      FewUniques (index=0) 
WHERE     Status = `A' 
--Force a particular index to be used by the Query Optimizer 
SELECT    * 
FROM      FewUniques (index=inFewUniquesStatus) 
WHERE     Status = `A'

Listing 3.6. Stats I/O from Listing 3.5.


--Stats I/O from table scan access (comment was added) 
Table: FewUniques  scan count 1,  logical reads: 55,  physical reads: 0, Â 
 read ahead reads: 0 
--Stats I/O from non-clustered indexed access (comment was added) 
Table: FewUniques  scan count 1,  logical reads: 53,  physical reads: 0, Â 
 read ahead reads: 0

The end results show that by using the index, two fewer pages needed to be read. The index has now become efficient. As the table grows in size (either number of rows or size of rows), this difference will increase and make the index the preferable method of accessing the data. Thus, a clustered index will do even more for the performance of this query by requiring fewer comparative reads than an equivalent non-clustered index.

Listing 3.7 shows another version of our table FewUniques where a clustered index is being created instead of a non-clustered one. This will drastically increase performance when you're selecting data from the table based on the status column.

Listing 3.7. Creating a table with a low number of unique values.


DROP TABLE FewUniques 
GO 
CREATE TABLE FewUniques 
( 
         Id        int       IDENTITY(1,1) NOT NULL, 
         Status    char(10)  NULL, 
         Col3      char(20)  NOT NULL, 
         Col4      char(50)  NOT NULL 
) 
GO 
DECLARE   @intNum int 
SELECT    @intNum = 0 
BEGIN TRAN 
WHILE     @intNum <= 1300 
BEGIN 
   INSERT FewUniques VALUES (CHAR(@intNum % 26 + 65), `test3', `test4') 
   SELECT @intNum = @intNum + 1 
END 
COMMIT TRAN 
GO 
CREATE CLUSTERED INDEX icFewUniquesStatus 
ON FewUniques (Status) 
GO

Listing 3.8 shows the results of querying this table, which looks like Listing 3.5 with one change--the index name is inFewUniquesStatus instead of icFewUniquesStatus. Note that the clustered index was much more efficient at retrieving the data. Because the data resides on the leaf level of the index, no additional jump to the data pages was necessary to read the data. Thus, fewer page reads needed to be performed.

Listing 3.8. Stats I/O from Listing 3.5.


--Stats I/O from table scan access (comment was added) 
Table: FewUniques  scan count 1,  logical reads: 78,  physical reads: 0, Â 
 read ahead reads: 0 
--Stats I/O from clustered indexed access (comment was added) 
Table: FewUniques  scan count 1,  logical reads: 5,  physical reads: 0, Â 
 read ahead reads: 0

High Number of Unique Values and Updating Indexed Column Data

As you have seen, clustered indexes help when there are some unique values on the column being examined, but a non-clustered index is the preferred method of data access when the number of unique values increases to the number of rows in the table. Although the number of pages read will not vary much from non-clustered to clustered index access, the write performance of the table is now in question.

Whenever you make a change to the column that is indexed, SQL Server has to make modifications to the index using that column. When using a clustered index, this requires possible changes to the index root, changes to the index nodes, and the insertion of the entire row in the data/leaf level, potentially requiring a shift of rows.

When you modify a column that has a non-clustered index, the index root may change, index nodes will change, and the leaf level will change. But the data level (separate from the leaf level) will not need to be shifted around because rows will be appended if needed. Because the data pages are handled separately from the index pages, less movement of entire rows of data will be required. This allows for quicker modifications to the data.

TIP Remember to update the statistics (UPDATE STATISTICS) and rebuild your indexes (DBCC DBREINDEX) on a regular basis. Over time, indexes become, well, fragmented, in a similar fashion to a hard drive. Rebuilding an index is really a kind of index defragmentation. Updating statistical information helps the Query Optimizer make better decisions about how it should process the query. Statistics are always updated when you rebuild an index.

Frequently Updated Columns

If you are instead using the indexed column to find the row that needs to be updated, a clustered index will get to the data faster. If you never update the column(s) on which the clustered index is built, you will not hurt performance when updating data because the index's data isn't involved. We will see more about this in the "Using Indexes for Retrieving and Updating Data" section later in this chapter.

Returning a Range of Values

The winner--most of the time--is the clustered index. The exception to the rule appears in a later section. But for most cases, the clustered index will do a much better job of retrieving a range of data. Listings 3.9 through 3.11 show the table and index creation, the two SELECT statements (using non-clustered and clustered indexes), and the Stats I/O results from both.

Listing 3.9. Creating a table with many unique values.


DROP TABLE FewUniques 
GO 
CREATE TABLE FewUniques 
( 
         Id        int       IDENTITY(1,1) NOT NULL, 
         status char(20) not null 
) 
GO 
DECLARE   @intNum int 
SELECT    @intNum = 0 
BEGIN TRAN 
WHILE     @intNum <= 5000 
BEGIN 
   INSERT FewUniques VALUES (`test' + convert(char(6),@intNum)) 
   SELECT @intNum = @intNum + 1 
END 
COMMIT TRAN 
GO 
CREATE CLUSTERED INDEX icFewUniquesId 
ON FewUniques (Id) 
GO 
CREATE INDEX inFewUniquesId 
ON FewUniques (Id) 
GO

Listing 3.10. Retrieving a range of rows.


SELECT    Status 
FROM      FewUniques (index=0) 
WHERE     Id BETWEEN 1000 and 1500 
SELECT    Status 
FROM      FewUniques (index=inFewUniquesId) 
WHERE     Id BETWEEN 1000 and 1500 
SELECT    Status 
FROM      FewUniques (index=icFewUniquesId) 
WHERE     Id BETWEEN 1000 and 1500

Listing 3.11. Stats I/O results from Listing 3.10.


--Stats I/O from table scan access (comment was added) 
Table: FewUniques  scan count 1,  logical reads: 71,  physical reads: 8, Â 
 read ahead reads: 54 
--Stats I/O from non-clustered indexed access (comment was added) 
Table: FewUniques  scan count 1,  logical reads: 506,  physical reads: 0, Â 
 read ahead reads: 0 
--Stats I/O from clustered indexed access (comment was added) 
Table: FewUniques  scan count 1,  logical reads: 9,  physical reads: 0, Â 
 read ahead reads: 0

Even the table scan was more efficient than the non-clustered index. As I stated earlier, if more than a few rows are being returned, using a non-clustered index is usually a poor choice. The clustered index requires the least number of reads to get the job done. SQL Server searches for only the first and last values in the index node pages and finds the first and last page on which the data resides. Then the data/leaf pages are read sequentially to get the requested information.

Primary and Foreign Keys

The decision to create a clustered index on a primary key will really depend on if another index would better benefit by being clustered. It isn't necessary to make the primary key a clustered index. Another index, one that has frequent range retrievals, for example, might be a better candidate for the clustered index. If this situation exists, use your good judgment and make the primary key a non-clustered index.

As for foreign keys, the same rule applies. Keep in mind, however, that foreign keys often contain repeated values (being the "many" side of a one-to-many relationship) and thus often fit the criteria for becoming a clustered index. Again, you are not obliged to make a foreign key a clustered index, but if it is the best candidate, there should be no question in your mind. If the table contains both a primary and foreign key, the decision becomes a little more involved.

For example, the titles table in the pubs database could have been designed to have a clustered index on the pub_id field and a non-clustered index on the title_id (primary key) field. The pub_id of titles will not have a unique set of data (most publishers produce more than one book). A clustered index on the pub_id column would be wiser than one on the title_id column because the latter has no duplicates. Thus, you would see a better overall performance when retrieving data from titles and publishers. If the publishers table is rarely joined to the titles table, or only a few titles are ever selected when joining to the publishers table, the present scenario of a clustered index on the title_id field and no index on the pub_id field is better.

The key is to know what types of queries will be performed. The more you know about what the users will be retrieving from the database, the easier the decisions about what types of indexes to use will be.

The following table contains a summary of when to use clustered and non-clustered indexes.

Table 3.1. Comparing clustered and non-clustered indexes.

Action/Description

*Use Clustered*	*Use Non-Clustered*
Columns often grouped/ordered by	Yes	Yes
Return a range of values	Yes	No
Low number of unique values	Yes	No
High number of unique values	No	Yes
One or a few unique values	No	No
Low number of returned rows	No	Yes
Frequently updated columns	No	Yes
Foreign key columns	Yes	Yes
Primary key columns	Yes	Yes
Frequent indexed column modifications	No	Yes

Of course, on a regular basis you will encounter situations that really are a combination of these conditions. Test out each type of index and see which will best do the job for you.

The Do's and Don'ts of Composite Indexes

When creating composite (multi-column) indexes, it is important to keep a few rules in mind. The order of the columns plays a very important role in the efficiency of the index and even the Query Optimizer's decision to use the index. Too many columns can also dramatically increase the size of the index, taking up more space and requiring more time to find the information being sought.

Rule 1: Don't include columns that no longer result in a unique set of data.
For example, if the index on the lastname and firstname columns contains very few duplicates, don't add the middlename column. You are only increasing the number of pages that need to be searched to find the same information.

TIP The following SELECT statement (replace column and table information as needed) will find duplicate items within Col1 and Col2 (and their count) in Table1.
SELECT    Col1, Col2, COUNT(*) as Total 
FROM      Table1 
GROUP BY  Col1, Col2 
HAVING    COUNT(*) > 1.

Rule 2: Columns that will not be individually searched on should never be listed first in the index.
For example, if you often search by lastname and firstname or just lastname, then firstname should not appear as the initial column of the composite index of lastname and firstname. If it does come before lastname, the search for just the lastname will never use the index, and if you force the use of that index, it will be very inefficient.
Rule 3: If you often retrieve data from only one or just a few fields of the table, a non-clustered index can cut down on the number of reads required if it contains all the fields being retrieved.

This is known as a covered query. If the index contains all the data that needs to be retrieved, there is no need to go to the data pages to get the data because the index already can supply all the requested information. Covered queries should only be used if the same few columns are very frequently requested. In addition, do not forget about performance when modifying data. If some of these fields are updated frequently, the index will need more maintenance from you and SQL Server. If the speeding up of the SELECT statement will not make up for the slowing down of these updates, a covered query is not recommended.

For example, let's say you create an index on the authors table that contains the following columns: au_lname, city, state, and phone. You are requesting the information (for a directory listing) from these four columns on a regular basis. However, you find that the authors tend to move a lot (we're funny that way), requiring changes to the city, state, and phone columns as frequently as you look up the information. The index is causing the updates to move much more slowly because three of the four columns get updated often, and the retrieval increase does not make up for that lost time. Thus, it's time to lose the index.

Listing 3.12 creates some more sample data in order to test the covered query scenario. Listing 3.13 selects data using both a clustered and non-clustered index. Listing 3.14 shows that the non-clustered index requires fewer reads to get the same data, even though a range of values is being retrieved.

Listing 3.12. Creating a table with many unique values.


DROP TABLE FewUniques 
GO 
CREATE TABLE FewUniques 
( 
         Id     int      IDENTITY(1,1) NOT NULL, 
         status char(20) not null 
) 
GO 
DECLARE   @intNum int 
SELECT    @intNum = 0 
BEGIN TRAN 
WHILE     @intNum <= 5000 
BEGIN 
   INSERT FewUniques VALUES (`test' + convert(char(6),@intNum)) 
   SELECT @intNum = @intNum + 1 
END 
COMMIT TRAN 
GO 
CREATE CLUSTERED INDEX icFewUniquesId 
ON FewUniques (Id) 
GO 
CREATE INDEX inFewUniquesId 
ON FewUniques (Id) 
GO

Listing 3.13. Selecting data via a covered query.


SELECT    Id 
FROM      FewUniques (index=inFewUniquesId) 
WHERE     Id BETWEEN 1000 and 1500 
SELECT    Id 
FROM      FewUniques (index=icFewUniquesId) 
WHERE     Id BETWEEN 1000 and 1500

Listing 3.14. Stats I/O results from Listing 3.13.


--Stats I/O from non-clustered indexed access (comment was added) 
Table: FewUniques  scan count 1,  logical reads: 5,  physical reads: 0,  read ahead  Â
reads: 0 
--Stats I/O from clustered indexed access (comment was added) 
Table: FewUniques  scan count 1,  logical reads: 9,  physical reads: 0,  read ahead  Â
reads: 0

The results show that no data pages were required to be read by the non-clustered index because it contained all the data being retrieved. It even beat the clustered index on range retrieval (because the data/leaf level pages take up more room). If the table had a larger row size, the non-clustered index would be that much more efficient than its clustered counterpart.

Can I Have Too Many Indexes?

Yes, Virginia, you can have too many indexes. Indexes can certainly help with data retrieval performance, but too many indexes can lead to inefficiency. With each index that you add to a table, that much more work must be done to maintain the set of indexes. Standard maintenance tasks, like updating table statistics and rebuilding indexes, will take longer with each new index you create. If an index is based on a column that gets updated, that is also additional work for SQL Server to perform every time you update that column.

If you are creating a data warehouse and few updates will be performed on the data, have at it. Create all the indexes you need to help speed up data retrieval. If, on the other hand, you are creating an online transaction processing (OLTP) database, you probably should keep the number of indexes to a minimum so that the data modifications will be faster.

Part of the standard database testing should include trying various combinations of clustered and non-clustered indexes. Some scenarios would reveal that several indexes fit in both the clustered and non-clustered index categories. The only way to find out which is best is to give each a try. Test out queries with each type and see which is more beneficial. Then let each test run for a few days or so and check with the users to see if they notice any difference in performance.

TIP Try varying the order of clustered and non-clustered indexes when you're asking the users about performance. You may find that they always see an increase on the second go-around, for example. If you vary the order of your index type when testing with users and see a pattern like this, you know they are being primed to see better (or worse) performance the second time around. You must then rely on your own tests to see what works best.

Using Indexes for Retrieving and Updating Data

When is an index good for both data retrieval and modification? The answer: When the field of the index is used to locate data for retrieval and modification but is not modified itself and when data is more frequently updated than inserted or deleted from the table. For example, the authors table in the pubs database has a key column au_id, which is used to find the author for retrieval and updates but is never modified itself.

All indexes cause a decrease in performance when inserting data. Because there is new information being introduced to the table, the index must react accordingly. If a new author is inserted into the authors table, a new entry in all indexes must also be added, adding time to the process.

The performance can increase when deleting data from a table if the field of the index is used to locate the row(s) to be deleted. Although some maintenance by SQL Server is required when removing a row from a table, the increased speed in finding the record to be deleted will normally outweigh the decreased speed of that maintenance. If you delete an author with an au_id of `123-45-6789' by using an index that has a key of only au_id, this index speeds the search for said author. Although an entry must be removed from the index (because the row is no longer in the table), the amount of time it takes to do this is much less than the amount of time saved by using the index to find the author in the first place.

Updating data always results in a performance increase if the field in the index is used to locate the row and if that field is not changed by any of the updates. If you update the field within an index, you cause the same maintenance by SQL Server that results in a performance decrease. If you were to update the author whose au_id was `111-22-3333' and an index using just au_id was present, using it would decrease the amount of time needed to find the record and make the change. Without it, SQL Server would have to search the entire table to make sure it found all records that needed updating.

Retrieving data will usually result in a speed boost if the indexes' field is used to locate the data. Keep in mind the information already presented to you in this chapter. If you select all rows of a table, it's best to use no index because the additional page reads of the index will just be added onto the number of data pages in the table, resulting in more work for SQL Server.

Indexes, Transactions, and Data Manipulation

This section aims to fill in a few gaps before going on to the discussion of locking and query optimization. The topic of transaction isolation levels introduces the four basic locking schemes in SQL Server.

Transaction Isolation Levels

SQL Server syntactically provides for the four transaction isolation levels as prescribed in the ANSI-SQL standard. The four transaction isolation levels are

Read uncommitted
Read committed
Repeatable read
Serializable

Two of these, repeatable read and serializable, are functionally the same in SQL Server and are provided for compatibility with ANSI SQL. (So it's actually only three isolation levels.) Read committed is the default isolation level for SQL Server connections. Transaction isolation levels can be set as follows:


SET TRANSACTION ISOLATION LEVEL READ COMMITTED

This setting is on a per-connection basis and lasts until the connection is dropped or the transaction isolation level is changed. You can temporarily override the setting by influencing the query optimizer (see "Using Locking Optimizer Hints" later in this chapter).

Read Uncommitted

Potentially, read uncommitted will give you the greatest increase in performance because it will read data regardless of its current locking state. Locking concurrency does not exist as far as this level is concerned. But this also provides for the least data integrity, allowing you to read data that has not yet been committed (mid-transaction). Imagine if a user was running a long update on some book prices, and at the same time another user was retrieving title prices with read uncommitted in effect. If the first update was rolled back, the second user's data would be incorrect, reflecting whatever values were in the table at the time it was read.

Read Committed

Read committed is the default setting for SQL Server. When data is being modified, a request to read that data will wait until the modifier's transaction has finished. This prevents data from being read until it has been committed, which provides better data integrity but slows down performance.

Repeatable Read and Serializable

When SQL Server reads data, normally the lock is released as soon as SQL Server is done. However, repeatable read and serializable cause locks to be held on data being read until the transaction in which the read exists has completed. This prevents what is known as the phantom phenomenon--reading different values from the same data within a single transaction. Although these two transaction isolation levels provide for the best data integrity, they also have the biggest potential for performance decrease because they hold locks on data that is being read. Unless it is vital to have consistent reads within the same transaction, these two should be avoided.

NOTE SQL Server treats both repeatable read and serializable the same, even though standard ANSI SQL differentiates between the two (although it's a subtle differentiation). Because they work the same in SQL Server, you can use either one when coding.

How Transactions and Indexes Affect Locking

Now to the heart of the matter. We have only mentioned locking in passing up until this point. We will now examine SQL Server's locking mechanism and how indexes and transaction isolation levels affect it. We will first examine the types of locks available in SQL Server. Next, we will investigate how you can use indexes to decrease locking concurrency. Finally, we will take a look at transaction scopes and the locking mechanism.

The Spectrum of Locks

There are two basic categories of locks in SQL Server: table and page. Table locks affect the entire table on which the data resides. If a table is locked for a read, no data modifications may occur anywhere in the table until that read is finished. If only a page is locked for a read within the table, other pages may be modified at will.

Based on this information, it is preferable to use page locks over table locks. But what are the differences and what are all of the actual locks? Table 3.2 shows the available page locks in SQL Server and their compatibility with each other.

Table 3.2. Page lock compatibility.

*Current Lock*	*Requested Locks*
	`Exclusive`	`Shared`	`Update`	`Insert_Page`	`Link_Page`
`Exclusive`	No	No	No	No	No
`Shared`	No	Yes	Yes	No	No
`Update`	No	Yes	No	No	No
`Insert_Page`*	No	No	No	Yes	Yes
`Link_Page`*	No	No	No	No	No

An Update lock is a special type of shared lock that is not compatible with itself. It is used to indicate that an update will be performed next, and its incompatibility with itself prevents more than one process from scaling up to an exclusive lock.

Denoted by asterisks in Table 3.2, Insert_Page and Link_Page are the two locks used in Insert Row-Level Locking, which allows more than one row to be inserted at the same time on the same (last) page of a table. Insert_Page allows for compatibility with itself and is the lock that allows for multiple inserts on the same page of data. Link_Page is used to go from a full page to the next page of data. An Insert_Page lock automatically escalates to a Link_Page lock when the data page is full.

Table locks are all-encompassing, as if you were locking all pages at the same time. Examine Table 3.3 for a list of table locks and their compatibility with each other.

Table 3.3. Table lock compatibility.

*Current Lock*	*Requested Lock*
	`Exclusive`	`Shared`	`Exclusive Intent`	`Shared Intent`
`Exclusive`	No	No	No	No
`Shared`	No	Yes	No	Yes
`Exclusive Intent`	No	No	Yes	Yes
`Shared Intent`	No	Yes	Yes	Yes

Intent locks are actually a reflection of the page locks that are occurring in the table. Consider the following scenario: A user runs a query that causes a Shared page lock, and another user runs a query that needs to place a Shared lock on the entire table. This is acceptable because a Shared Intent table lock allows for a Shared table lock. If another user runs a query that needs to place an Exclusive table lock while a page has a Shared lock, the request has to wait until the page lock is released (pending no other Shared page locks occur during the interim).

If more than four Shared lock requests occur while an Exclusive lock is waiting, the Exclusive lock will be next in line and additional Shared locks will wait until the previous Shared and Exclusive locks have completed. This is known as a demand lock and is automatically performed by SQL Server.

A third category of locks, extent locks, are automatically allocated by SQL Server when needed. An extent (8 pages) can be locked by the CREATE and DROP statements and by the INSERT and UPDATE statements if new pages for data or indexes are being added to the table.

Using Indexes to Decrease Locking Concurrency

So why all the talk about locking? Well, it just so happens that if you can lock as few pages as possible when manipulating data, you will potentially increase the performance of SQL Server by not making processes wait for locked data to be freed. If 200 users are modifying different customers' data, it is more beneficial to lock only the page on which the individual customer's information resides than to lock the entire table. Although several customers may reside on the same page, thus making some requests wait until each has completed, this is preferable to making all requests wait in a queue while each modification locks the entire table.

So how do you request page locks instead of table locks? By using indexes, of course. As mentioned earlier, an index on a field that locates the data to be modified will increase performance by going right to the page where the data resides, rather than wading through all the data pages to find the particular row. Well, if you only go to the row's page in the table, you only need to lock that page, rather than the entire table, when updating the data.

This helps increase performance yet again. Not only are fewer pages read when making the change to the row, but fewer pages are locked, potentially preventing others from having to wait until your modification has finished.

Transaction Scope, Transaction Isolation Levels, and Locking

Certain locking rules apply within a transaction, depending on what type of transaction isolation level is currently in effect. Table 3.4 helps describe those differences.

Table 3.4. Transaction isolation levels and locking effects.

*Isolation Level*	*Lock Type and Length*
	`Shared`	`Exclusive`
Read uncommitted	As Needed	Length of Transaction
Read committed	As Needed	Length of Transaction
Repeatable read	Length of Transaction	Length of Transaction

As you can see, Shared locks are normally held only while data is actually being retrieved. The Shared lock is released once that process is completed, even if the transaction has not yet completed. This is why the phantom phenomenon can happen. If the transaction isolation level is set to repeatable read (or serializable), the Shared lock is held for the length of the transaction. Exclusive locks are always held for the length of the transaction, regardless of the isolation level.

What does this mean? It means your transactions should be only as long as needed to maintain data integrity. Lengthy transactions that modify data (that use Exclusive locks, in other words) can slow down SQL Server by making other processes wait for those transactions to complete. The longer a transaction that contains data modification statements takes to execute, the greater the potential for locking concurrency.

Listing 3.15 shows a sample script that will hold onto a Shared lock in the authors table. It also has a delay that lasts two minutes to give you time to check the current server activity.

Listing 3.15. Testing transaction isolation levels.


SET TRANSACTION ISOLATION LEVEL REPEATABLE READ 
GO 
BEGIN TRAN 
SELECT    * 
FROM      authors 
WHERE     au_lname = `Green' 
WAITFOR DELAY '00:02:00' 
ROLLBACK TRAN 
GO

Once you execute this script, you can go to the Server menu and select Current Activity in Enterprise Manager to see what is happening in the server. Your results should look something like Figure 3.3.

Figure 3.3. Current Activity window while shared page locks are in effect.

The Activity Legend reveals that page table intent locks are being taken out by SQL Server when retrieving the data. The Current Activity Window shows that the Shared locks are held until the transaction is completed (in other words, until the WAITFOR and ROLLBACK TRAN statements are completed).

Optimizing the Query Optimizer

Now we get to the good stuff: Telling SQL Server how to perform its queries. SQL Server has excellent decision-making capabilities when it comes to figuring out which index, if any at all, would get the job done most efficiently. Sometimes, however, complicated queries with many joins and elaborate WHERE clauses can be interpreted incorrectly. SQL Server provides a mechanism to override the Query Optimizer's decision and specify exactly what indexes, locking, and table order should be used to get or change the data.

Using Index Optimizer Hints

You have already seen this feature in action in previous sections of this chapter. The following code listing demonstrates two examples of data retrieval with index optimizer hints from the authors table.

Listing 3.16. Index optimizer hints.


SELECT    au_lname, au_fname 
FROM      authors (index = 0) 
SELECT    au_lname, au_fname 
FROM      authors (index = 1) 
WHERE     au_id = `213-46-8915' 
SELECT    au_lname, au_fname 
FROM      authors (index = aunmind) 
WHERE     au_lname = `Green'

The first SELECT statement uses the hint (index = 0) to force a table scan (no index) when retrieving the data. The second SELECT statement uses the hint (index = 1) to force the use of the clustered index. The third SELECT statement forces the use of the non-clustered index aunmind with the hint (index = aunmind). The index number and name are interchangeable (except for table scans, where a 0 will need to be used). Therefore, if you preferred, the third SELECT statement could have very well been as shown in Listing 3.17.

Listing 3.17. A variation on index optimizer hints.


SELECT    au_lname, au_fname 
FROM      authors (index = 2) 
WHERE     au_lname = `Green'

Because the aunmind index in the authors table has an ID of 2, either value can be used by the query optimizer.

TIP The index ID can be found in the sysindexes table of the database in which the index resides. An ID of 0 means no index, and an ID of 1 always represents the clustered index. And IDs 2 through 255 represent non-clustered indexes.

Listing 3.18 is a variation on Listing 3.15, and shows how a table lock can be used instead of page locks by forcing a table scan.

Listing 3.18. Testing index optimizer hints.


SET TRANSACTION ISOLATION LEVEL REPEATABLE READ 
GO 
BEGIN TRAN 
SELECT    * 
FROM      authors (index=0) 
WHERE     au_lname = `Green' 
WAITFOR DELAY '00:02:00' 
ROLLBACK TRAN 
GO

Once again, you need to go to the Current Activity window to see the results shown in Figure 3.4. If the Current Activity window is already open, you will need to refresh the displayed data by clicking the Refresh button.

Figure 3.4. The Current Activity window while a table lock is in effect.

Using Locking Optimizer Hints

Let's take the example in Listing 3.18 one step further. By changing the optimizer hint, we can make SQL Server do an Exclusive table lock on the authors table (as shown in Listing 3.19 and Figure 3.5).

Listing 3.19. Forcing an exclusive table lock.


BEGIN TRAN 
SELECT    * 
FROM      authors (tablockx) 
WHERE     au_lname = `Green' 
WAITFOR DELAY '00:02:00' 
ROLLBACK TRAN 
GO

Figure 3.5. The Current Activity window while an Exclusive table lock is in effect.

This script will give activity results like those shown in Figure 3.5. The SELECT statement uses the optimizer hint of tablockx to keep exclusive table locks in place until the transaction has completed.

The available locking optimizer hints are shown in Table 3.5.

Table 3.5. Locking optimizer hints and descriptions.

*Optimizer Hint*	*Description*
`holdlock`	Hold the lock until the end of the transaction.
`nolock`	Do not use locks when retrieving data.
`paglock`	Use page locks.
`tablock`	Use a table lock.
`tablockx`	Use an exclusive table lock.
`updlock`	Use an update lock.

holdlock can be used to hold shared locks for the duration of the transaction. This should already sound familiar to you in the form of serializable and repeatable read transaction isolation levels. If you would prefer to occasionally hold Shared locks, you would be better off keeping the default transaction isolation level of read committed and using the holdlock optimizer hint as needed.

nolock has the same functionality as the read uncommitted transaction isolation level. It forces the reading of uncommitted data by not requiring any locks while reading the data (thus bypassing the blocking by any exclusive locks).

One final note regarding index and locking optimizer hints: You can combine both types, but it is important to list the index hint last. The code in Listing 3.20 shows an example of a valid set of optimizer hints.

Listing 3.20. Mixing optimizer hints.


SELECT    * 
FROM      authors (paglock holdlock index=aunmind)

Optimizing Table Order in Queries

There is no optimizer hint to place in the SELECT statement that will force tables to be selected in the order in which they appear. Rather, SQL Server will decide which table would be the better choice to select from first, based on what data is being requested and what is in the WHERE clause. So when tables are joined, SQL Server decides which one it uses first.

Well, not exactly. You see, there is a way to force the optimizer to use the tables in the order they are presented. It's just not part of the SELECT statement itself. Rather, you use the SET FORCEPLAN statement to tell SQL Server to use the table order as it appears or to decide on its own. Examine Listing 3.21 and its query plan in Listing 3.22.

Listing 3.21. Joining three tables.


SELECT    au_lname, title 
FROM      titles t 
         JOIN titleauthor ta ON ta.title_id = t.title_id 
         JOIN authors a ON a.au_id = ta.au_id 
WHERE     au_lname = `Green'

Listing 3.22. Query plan from Listing 3.21.


STEP 1 
The type of query is SELECT 
FROM TABLE 
authors a 
Nested iteration 
Index : aunmind 
FROM TABLE 
titleauthor ta 
Nested iteration 
Table Scan 
FROM TABLE 
titles t 
Nested iteration 
Using Clustered Index

Although the titles table is listed first, then titleauthor, and finally authors in the FROM clause, authors is used first by the query optimizer because its data is being preselected. We can force SQL Server to use the tables as they appear, as shown in Listings 3.23 and 3.24.

Listing 3.23. Using the SET FORCEPLAN statement.


SET FORCEPLAN ON 
SELECT    au_lname, title 
FROM      titles t 
         JOIN titleauthor ta ON ta.title_id = t.title_id 
         JOIN authors a ON a.au_id = ta.au_id 
WHERE     au_lname = `Green' 
SET FORCEPLAN OFF

Listing 3.24. Query plan from Listing 3.23.


STEP 1 
The type of query is SELECT 
FROM TABLE 
titles t 
Nested iteration 
Table Scan 
FROM TABLE 
titleauthor ta 
Nested iteration 
Table Scan 
FROM TABLE 
authors a 
Nested iteration 
Table Scan

Now SQL Server is using titles first, then titleauthor, and finally authors, just as they are listed in the FROM clause. In this case, however, examination of the Stats I/O would reveal that the latter is much more inefficient than the former. SQL Server decided to use authors first because it would cut down on the number of lookups in the other tables by limiting the authors table to a single or few rows. In most cases, SQL Server will make the best decision. But if you find a complex join query that's running more slowly than you think it should, you may want to play around with table order and the SET FORCEPLAN statement.

Summary

In all honesty, the information provided here provides only some of the SQL Server optimization techniques. Network performance, NT Server performance, application design, and so on will all have an effect on how well your SQL Server will do its job.

Great queries and indexes mean nothing if there is not enough memory on the NT Server on which SQL Server resides. What may appear to the users to be poor performance could be alleviated not by playing with indexes but by installing faster network cards or using switches on the network. Applications that issue client-based transaction statements and hold them open can slow down a server to a crawl.

Be sure to check all possibilities when you need to increase performance. Work with the network administrators and application programmers. Make sure that everyone is doing their part to make SQL Server perform to the best of its ability.

For more information on other optimization techniques, see Chapter 14, "Writing Effective Code," or see the book Microsoft SQL Server 6.5 Unleashed.

Read More Show Less

Customer Reviews

Be the first to write a review

( 0 )

5 Star

(0)

4 Star

(0)

3 Star

(0)

2 Star

(0)

1 Star

(0)

If you find inappropriate content, please report it to Barnes & Noble

Microsoft SQL Server 6.5 Programming Unleashed

Overview

More About This Book

Overview

Editorial Reviews

Booknews

Product Details

Related Subjects

Table of Contents

First Chapter

Microsoft SQL Server 6.5 Programming Unleashed - 3 - Optimizing Queries

A Closer Look at Index Structures

Non-Clustered Indexes

Clustered Indexes

Clustered Versus Non-Clustered Indexes

One or Few Unique Values in Columns

Listing 3.1. Creating a table with few unique values.

Listing 3.2. Selecting table data with and without an index.

Listing 3.3. The Stats I/O results from Listing 3.2.

Low Number of Unique Values

Listing 3.4. Creating a table with a low number of unique values.

Listing 3.5. Selecting data with and without an index.

Listing 3.6. Stats I/O from Listing 3.5.

Listing 3.7. Creating a table with a low number of unique values.

Listing 3.8. Stats I/O from Listing 3.5.

High Number of Unique Values and Updating Indexed Column Data

Frequently Updated Columns

Returning a Range of Values

Listing 3.9. Creating a table with many unique values.

Listing 3.10. Retrieving a range of rows.

Listing 3.11. Stats I/O results from Listing 3.10.

Primary and Foreign Keys

Table 3.1. Comparing clustered and non-clustered indexes.

The Do's and Don'ts of Composite Indexes

Listing 3.12. Creating a table with many unique values.

Listing 3.13. Selecting data via a covered query.

Listing 3.14. Stats I/O results from Listing 3.13.

Can I Have Too Many Indexes?

Using Indexes for Retrieving and Updating Data

Indexes, Transactions, and Data Manipulation

Transaction Isolation Levels

Read Uncommitted

Read Committed

Repeatable Read and Serializable

How Transactions and Indexes Affect Locking

The Spectrum of Locks

Table 3.2. Page lock compatibility.

Table 3.3. Table lock compatibility.

Using Indexes to Decrease Locking Concurrency

Transaction Scope, Transaction Isolation Levels, and Locking

Table 3.4. Transaction isolation levels and locking effects.

Listing 3.15. Testing transaction isolation levels.

Optimizing the Query Optimizer

Using Index Optimizer Hints

Listing 3.16. Index optimizer hints.

Listing 3.17. A variation on index optimizer hints.

Listing 3.18. Testing index optimizer hints.

Using Locking Optimizer Hints

Listing 3.19. Forcing an exclusive table lock.

Table 3.5. Locking optimizer hints and descriptions.

Listing 3.20. Mixing optimizer hints.

Optimizing Table Order in Queries

Listing 3.21. Joining three tables.

Listing 3.22. Query plan from Listing 3.21.

Listing 3.23. Using the SET FORCEPLAN statement.

Listing 3.24. Query plan from Listing 3.23.

Summary

Customer Reviews

5 Star

4 Star

3 Star

2 Star

1 Star

Microsoft SQL Server 6.5 Programming Unleashed
- 3 -
Optimizing Queries