Database Management Systems

by Raghu Ramakrishnan and Johannes Gehrke

[Database Management Systems (3rd Edition)] [WWW Resources] [Material for the Third Edition] [Material for the First Edition] [Material for the Second Edition]

 

Choice of Topics in ``Database Management Systems, Second Edition''

The main pedagogical objective of this book is to provide clear and thorough discussions of the topics covered, with detailed examples, using a quantitative approach whenever appropriate. An important feature of the book is the extensive set of exercises, with a strong emphasis on problem-solving. The choice of material has been influenced by these considerations:

(1) To concentrate on issues central to the design, tuning and implementation of database applications.

(2) To provide adequate coverage of implementation topics to support a concurrent laboratory section or course project. For example, implementation of relational operations has been covered in more detail than is necessary in a first course. However, the variety of alternative implementation techniques allows for a wide choice of project assignments: an instructor who wishes to assign implementation of Sort-Merge Join might cover that topic in depth, whereas another might choose to emphasize Index Nested Loops Join.

(3) To provide in-depth coverage of the state of the art in currently available commercial systems, rather than a broad coverage of several alternatives.

Chapter Organization

A modular organization has been used to enable instructors who teach courses with an emphasis on one or more of the following areas to omit irrelevant material without loss of continuity:
  • Storage, file structures and indexing
  • Query languages and relational model concepts
  • Query processing
  • Database design and tuning
  • Transaction processing
  • Advanced topics
The material can be divided into roughly seven main parts, as indicated in the figure, which also shows the dependencies between chapters. An arrow from Chapter I to Chapter J means that I depends on material in J. The broken arrows indicate a weak dependency, which can be ignored at the instructor's discretion. The first two chapters cover material that is basic to the rest of the book, and we assume that they are covered first. Each of the remaining six parts is described below, along with its dependence, if any, on the other parts.

Part I: Basics. Introduction to database systems. Conceptual design using the ER-model, the relational model, translation of the ER-model to the relational model, SQL DDL and referential integrity. The discussion of the ER model concentrates on the alternatives available for modeling an enterprise (e.g., attribute vs. entity, binary vs. ternary relationships, etc.) and includes generation of SQL-92 Create Table statements from ER diagrams. 

Part II: Relational Queries. Query languages (relational algebra and calculus, QBE and SQL). The coverage of SQL follows the SQL-92 standard, and is extensive; in addition to the data retrieval features of SQL, many advanced features such as embedded SQL, ODBC and JDBC, complex integrity constraints, and triggers are discussed.

Part III: Storage and Indexing. Disks, tapes, buffer management, record/page and file formats, file organizations  and indexes. This part depends only on Chapter 7, and this is only to the extent that implementing a file organization requires an understanding of file and record formats.

Part IV: Query Evaluation. Implementation of relational operators and generation of query plans. The treatment of these topics is sufficiently detailed to allow projects involving operator implementation, and provides the necessary background to explore optimization issues using the optimizer visualization tool of Minibase.

Part V: Database Design. This is one of the focal points of the book. Schema refinement and normalization, physical design and performance tuning, and security are discussed in detail. The discussion of normalization stresses the role of conceptual design and underscores the need for refining an initial relational design obtained from ER diagrams. In an important sense, the first four parts of the book lead up to the material on database design and tuning. We discuss these topics from a very practical perspective, with several examples. A reader who studies this material carefully will quickly realize that a thorough understanding of several basic issues is essential for good design. The discussion of physical design and tuning assumes a good understanding of query optimization, and in particular, of the use of indexes.

Part VI: Transaction Management. Concurrency control and recovery are covered with an emphasis on locking and logging techniques. The discussion of concurrency control in tree indexes depends upon a knowledge of tree indexes.

Part VII: Advanced Topics The discussion of query optimization in parallel and distributed databases assumes an understanding of query optimization in a centralized DBMS. In general, the chapters in Part VII should be covered after material from the other parts has been covered. In-depth chapters on Internet Databases, Decision Support, Data Mining, Object Databases, Deductive Databases, and Spatial are included. An overview chapter on Additional Topics provides pointers for further reading (e.g., advanced transaction processing, mobile databases, GIS and other applications, main memory databases, information visualization).

Order of Presentation

The material in the text is presented in the order that we usually cover it, and is influenced by the order of the accompanying programming assignments, and by our preference for covering all database design topics together after covering all optimization related material. It is likely that other instructors will want to cover some topics in a different order, based on the needs of their courses. For example, some instructors will want to cover file organizations or indexing before discussing SQL and relational query languages. In a course that emphasizes database design, many of the implementation-oriented chapters might well be skipped altogether. The book has therefore been designed to be flexible with respect to the ordering of material, and inter-chapter dependencies have been kept to a minimum. In particular, Parts I through V, regarded as units, can be covered in pretty much any order, although it is probably a good idea to cover Part II before Part IV, and to cover Part II before discussing physical design and tuning. The remaining inter-part dependencies are minor.

Some additional points to note:

Several section headings contain an asterisk. This does not necessarily indicate a higher level of difficulty. Rather, omitting all asterisked sections leaves about the right amount of material in Chapters 1 through 20 for a broad introductory one-quarter or one-semester course (depending on the depth at which the remaining material is discussed and the nature of the course assignments).

It is not necessary to cover all the alternatives for a given operator in order to cover query optimization adequately.

The material on SQL queries can be taught without first covering relational algebra and calculus. This may be desirable if an instructor wishes to assign SQL assignments early. However, an understanding of algebra and calculus will enable students to appreciate the foundations of SQL. Similarly, QBE can be taught without a prior discussion of DRC, but briefly covering DRC first is recommended.

The book contains more material than can be covered in a one semester course. It can be used in several kinds of introductory or second courses by choosing topics appropriately, or in a two-course sequence by supplementing the material with some advanced readings in the second course.

August 1999

Raghu Ramakrishnan [raghu@cs.wisc.edu] and Johannes Gehrke (johannes@cs.cornell.edu)