Read an Excerpt
SQL For Dummies
By Allen G. Taylor
John Wiley & Sons
Copyright © 2003
Allen G. Taylor
All right reserved.
ISBN: 0-7645-4075-0
Chapter One
Relational Database
Fundamentals
* * *
In This Chapter
* Organizing information
* Defining database
* Defining DBMS
* Comparing database models
* Defining relational database
* Considering the challenges of database design
* * *
SQL (short for structured query language) is an industry-standard language
specifically designed to enable people to create databases, add new data
to databases, maintain the data, and retrieve selected parts of the data.
Various kinds of databases exist, each adhering to a different conceptual
model. SQL was originally developed to operate on data in databases that
follow the relational model. Recently, the international SQL standard has
incorporated part of the object model, resulting in hybrid structures called
object-relational databases. In this chapter, I discuss data storage, devote a
section to how the relational model compares with other major models, and
provide a look at the important features of relational databases.
Before I talk about SQL, however, first things first: I needto nail down what I
mean by the term database. Its meaning has changed as computers have
changed the way people record and maintain information.
Keeping Track of Things
Today, people use computers to perform many tasks formerly done with
other tools. Computers have replaced typewriters for creating and modifying
documents. They've surpassed electromechanical calculators as the best
way to do math. They've also replaced millions of pieces of paper, file folders,
and file cabinets as the principal storage medium for important information.
Compared to those old tools, of course, computers do much more, much
faster - and with greater accuracy. These increased benefits do come at a
cost, however. Computer users no longer have direct physical access to their
data.
When computers occasionally fail, office workers may wonder whether computerization
really improved anything at all. In the old days, a manila file
folder only "crashed" if you dropped it - then you merely knelt down, picked
up the papers, and put them back in the folder. Barring earthquakes or other
major disasters, file cabinets never "went down," and they never gave you an
error message. A hard drive crash is another matter entirely: You can't "pick
up" lost bits and bytes. Mechanical, electrical, and human failures can make
your data go away into the Great Beyond, never to return.
Taking the necessary precautions to protect yourself from accidental data
loss allows you to start cashing in on the greater speed and accuracy that
computers provide.
If you're storing important data, you have four main concerns:
often.
and find some (or all) of your data missing.
you store.
from the tons of data that you don't want.
State-of-the-art computer databases satisfy these four criteria. If you store
more than a dozen or so data items, you probably want to store those items
in a database.
What Is a Database?
The term database has fallen into loose use lately, losing much of its original
meaning. To some people, a database is any collection of data items (phone
books, laundry lists, parchment scrolls ... whatever). Other people define
the term more strictly.
In this book, I define a database as a self-describing collection of integrated
records. And yes, that does imply computer technology, complete with languages
such as SQL.
A record is a representation of some physical or conceptual object. Say, for
example, that you want to keep track of a business's customers. You assign a
record for each customer. Each record has multiple attributes, such as name,
address, and telephone number. Individual names, addresses, and so on are
the data.
A database consists of both data and metadata. Metadata is the data that
describes the data's structure within a database. If you know how your data
is arranged, then you can retrieve it. Because the database contains a description
of its own structure, it's self-describing. The database is integrated because
it includes not only data items but also the relationships among data items.
The database stores metadata in an area called the data dictionary, which
describes the tables, columns, indexes, constraints, and other items that
make up the database.
Because a flat file system (described later in this chapter) has no metadata,
applications written to work with flat files must contain the equivalent of the
metadata as part of the application program.
Database Size and Complexity
Databases come in all sizes, from simple collections of a few records to mammoth
systems holding millions of records.
A personal database is designed for use by a single person on a single computer.
Such a database usually has a rather simple structure and a relatively
small size. A departmental or workgroup database is used by the members of a
single department or workgroup within an organization. This type of database
is generally larger than a personal database and is necessarily more complex;
such a database must handle multiple users trying to access the same data at
the same time. An enterprise database can be huge. Enterprise databases may
model the critical information flow of entire large organizations.
What Is a Database Management
System?
Glad you asked. A database management system (DBMS) is a set of programs
used to define, administer, and process databases and their associated applications.
The database being "managed" is, in essence, a structure that you
build to hold valuable data. A DBMS is the tool you use to build that structure
and operate on the data contained within the database.
Many DBMS programs are on the market today. Some run only on mainframe
computers, some only on minicomputers, and some only on personal computers.
A strong trend, however, is for such products to work on multiple
platforms or on networks that contain all three classes of machines.
A DBMS that runs on platforms of multiple classes, large and small, is called
scalable.
Whatever the size of the computer that hosts the database - and regardless
of whether the machine is connected to a network - the flow of information
between database and user is the same. Figure 1-1 shows that the user communicates
with the database through the DBMS. The DBMS masks the physical
details of the database storage so that the application need only concern
itself with the logical characteristics of the data, not how the data is stored.
Flat Files
Where structured data is concerned, the flat file is as simple as it gets. No, a
flat file isn't a folder that's been squashed under a stack of books. Flat files
are so called because they have minimal structure. If they were buildings,
they'd barely stick up from the ground. A flat file is simply a collection of one
data record after another in a specified format - the data, the whole data,
and nothing but the data - in effect, a list. In computer terms, a flat file is
simple. Because the file doesn't store structural information (metadata), its
overhead (stuff in the file that is not data) is minimal.
Say that you want to keep track of the names and addresses of your company's
customers in a flat file system. The system may have a structure something
like this:
Harold Perciva l26262 S. Howards Mill Rd Westminster CA92683
Jerry Appel 32323 S. River Lane Rd Santa Ana CA92705
Adrian Hansen 232 Glenwood Court Anaheim CA92640
John Baker 2222 Lafayette St Garden GroveCA92643
Michael Pens 77730 S. New Era Rd Irvine CA92715
Bob Michimoto 25252 S. Kelmsley Dr Stanton CA92610
Linda Smith 444 S. E. Seventh St Costa Mesa CA92635
Robert Funnell 2424 Sheri Court Anaheim CA92640
Bill Checkal 9595 Curry Dr Stanton CA92610
Jed Style 3535 Randall St Santa Ana CA92705
As you can see, the file contains nothing but data. Each field has a fixed
length (the Name field, for example, is always exactly 15 characters long),
and no structure separates one field from another. The person who created
the database assigned field positions and lengths. Any program using this file
must "know" how each field was assigned, because that information is not
contained in the database itself.
Such low overhead means that operating on flat files can be very fast. On the
minus side, however, application programs must include logic that manipulates
the file's data at a very low level of complexity. The application must
know exactly where and how the file stores its data. Thus, for small systems,
flat files work fine. The larger a system is, however, the more cumbersome a
flat file system becomes. Using a database instead of a flat file system eliminates
duplication of effort. Although database files themselves may have
more overhead, the applications can be more portable across various hardware
platforms and operating systems. A database also makes writing application
programs easier because the programmer doesn't need to know the
physical details of where and how the files store their data.
Databases eliminate duplication of effort, because the DBMS handles the
data-manipulation details. Applications written to operate on flat files must
include those details in the application code. If multiple applications all
access the same flat file data, these applications must all (redundantly)
include that data manipulation code. By using a DBMS, you don't need to
include such code in the applications at all.
Clearly, if a flat file-based application includes data-manipulation code that
only runs on a particular hardware platform, then migrating the application
to a new platform is a headache waiting to happen. You have to change all
the hardware-specific code - and that's just for openers. Migrating a similar
DBMS-based application to another platform is much simpler - fewer complicated
steps, fewer aspirin consumed.
Database Models
Different as databases may be in size, they are generally always structured
according to one of three database models:
are almost exclusively of the relational type. Organizations that
already have a major investment in hierarchical or network technology
may add to the existing model, but groups that have no need to maintain
compatibility with "legacy systems" nearly always choose the relational
model for their databases.
a simple hierarchical structure that allows fast data access. They suffer
from redundancy problems and a structural inflexibility that makes database
modification difficult.
advantage with structural complexity.
The first databases to see wide use were large organizational databases that
today would be called enterprise databases, built according to either the
hierarchical or the network model. Systems built according to the relational
model followed several years later. SQL is a strictly modern language; it
applies only to the relational model and its descendant, the object-relational
model. So here's where this book says, "So long, it's been good to know ya,"
to the hierarchical and network models.
New database management systems that are not based on the relational
model probably conform to the newer object model or the hybrid object-relational
model.
Relational model
Dr. E. F. Codd of IBM first formulated the relational database model in 1970,
and this model started appearing in products about a decade later. Ironically,
IBM did not deliver the first relational DBMS. That distinction went to a small
start-up company, which named its product Oracle.
Relational databases have replaced databases built according to earlier
models because the relational type has valuable attributes that distinguish
relational databases from those other database types. Probably the most
important of these attributes is that, in a relational database, you can change
the database structure without requiring changes to applications that were
based on the old structures. Suppose, for example, that you add one or more
new columns to a database table. You don't need to change any previously
written applications that will continue to process that table, unless you alter
one or more of the columns used by those applications.
Of course, if you remove a column that an existing application references,
you experience problems no matter what database model you follow. One of
the best ways to make a database application crash is to ask it to retrieve a
kind of data that your database doesn't contain.
Why relational is better
In applications written with DBMSs that follow the hierarchical or network
model, database structure is hard-coded into the application - that is, the
application is dependent on the specific physical implementation of the database.
If you add a new attribute to the database, you must change your application
to accommodate the change, whether or not the application uses the
new attribute.
Relational databases offer structural flexibility; applications written for those
databases are easier to maintain than similar applications written for hierarchical
or network databases. That same structural flexibility enables you to
retrieve combinations of data that you may not have anticipated needing at
the time of the database's design.
Components of a relational database
Relational databases gain their flexibility because their data resides in tables
that are largely independent of each other. You can add, delete, or change
data in a table without affecting the data in the other tables, provided that
the affected table is not a parent of any of the other tables. (Parent-child table
relationships are explained in Chapter 5, and no, it doesn't mean discussing
allowances over dinner.) In this section, I show what these tables consist of
and how they relate to the other parts of a relational database.
Continues...
Excerpted from SQL For Dummies
by Allen G. Taylor
Copyright © 2003 by Allen G. Taylor.
Excerpted by permission.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.