Special Edition Using Xml Schema

Special Edition Using Xml Schema

by David Gulbransen
     
 

Special Edition Using XML Schema starts with an explanation of Schema basics: why they were created, the advantages they offer over DTDs, and an overview of the two major parts of the specification: Structure and Datatypes. Next, the author explains the differences between DTDs and Schemas, and demonstrates a simple DTD-to-Schema conversion. The

See more details below

Overview

Special Edition Using XML Schema starts with an explanation of Schema basics: why they were created, the advantages they offer over DTDs, and an overview of the two major parts of the specification: Structure and Datatypes. Next, the author explains the differences between DTDs and Schemas, and demonstrates a simple DTD-to-Schema conversion. The bulk of the book explains the many parts of Schemas, dissecting the structure of a Schema and then introducing Datatypes. Each section includes practical examples, which the author creates and explains, building from the material discussed in the previous section. At the end of the book, the author demonstrates a large, real-world example Schema, showing how all the parts of Schemas interact and how readers would build XML data from the example Schema.

Editorial Reviews

XML schema offers an alternative to describing an XML grammar with document type definitions (DTDs). Written for developers familiar with XML, this book explains the syntax and structure of XML schema and why a developer may want to use it. The author explains how datatypes define a lexical meaning for the value of an element or an attribute, and builds an example schema for storing human resource files. Annotation c. Book News, Inc., Portland, OR (booknews.com)

Product Details

ISBN-13:
9780789726070
Publisher:
Pearson Technology Group 2
Publication date:
10/22/2001
Series:
Special Edition Using Series
Edition description:
Special Edition
Pages:
496
Product dimensions:
7.50(w) x 9.25(h) x 1.00(d)

Read an Excerpt

Chapter 1: An Overview of XML Schema

The Politics of XML

XML was originally derived from another meta-language for developing structured markup, the Standard Generalized Markup Language (SGML). SGML is a powerful technology that allows users to engage in very complex descriptions of documents and data. SGML allowed the creation of customized markup languages, such as HTML, which allow users to use a common markup language for various applications, such as Web display with HTML. SGML is powerful and flexible, but it is also intimidating. Many people who could have benefited from SGML were afraid of getting lost in the complexities of the technology and so were reluctant to use it.

This gave rise to XML. XML was designed to take some of the power of SGML but make it simpler and easier to use. The goal of the original working group was "80/20", that is, to get 80% of the functionality of SGML with only 20% of the complexity. With the XML 1.0 Recommendation, it seemed that XML would achieve the 80/20 goal, but then a confusing array of related technologies entered the picture: XML Namespaces, XSL, XSLT, XPath, XPointer, XQuery, and eventually XML Schema.

XML is growing more and more complex everyday. Many XML experts lament the situation and view the current set of XML-related standards as growing chaotically out of control. One expert lamented, "As if Internet time weren't fast enough, now we seem to have XML time!" The result is that recommendations coming from the W3 C are seemingly increasing in complexity, and decreasing in the time spent contemplating them.

Many of the recommendations are mired in controversy, with the XML community divided into two camps: those who want recommendations that include every possible usage of the language, and those who are still dedicated to the 80/20 philosophy. XML Schema are certainly mired in the simplicity versus comprehensiveness debate as well.

The controversy surrounding XML Schema has followed the technology around from conference to conference, and even resulted in the development of competing technologies such as RELAX (REgular LAnguage for Xml) and TREX (Tree Regular Expressions for XML). These technologies are gaining a growing following, even though they are not specifications which originate from or are even being considered by the W3 C.

XML Schema are a very useful technology, but how and why you might choose to use XML Schema varies widely.

No one schema mechanism suits all applications of XML. It is virtually impossible to imagine one technology that addresses all of the potential uses for XML.

Such is the case with XML Schema. Some of you will delve into XML Schema only to find that you barely need to scratch the surface in order to accomplish your needs. Others will read this entire book only to find that Schema simply don't do everything you need. Others may find that an alternative technology, such as RELAX NG or Schematron or even the "older" Document Type Definitions (DTDs), is better at solving your particular problems.

Note
The controversy over which features to include in XML Schema and the level of complexity gave birth to several suggested alternatives, including RELAX, TREX, Schematron, and so on. Already, RELAX and TREX have been combined into RELAX NG, and as time passes there will certainly be more shakeouts in the alternative schema world. We will discuss alternatives to the XML Schema Recommendation more in Chapter 18, and in the Appendixes.

That's why this book sets out to accomplish two goals: not only to explain to you the syntax and language of XML Schema, but also to educate you about the why of XML Schema. It is as important, if not more important, to know when you need to use a Schema as it is to know how. By working your way through the sections of this book, you'll get a solid understanding of what schema are, why you would want to use them, and finally, how you would write a schema in the XML Schema language.

The first step is to look at what brought the XML Schema Recommendation into being in the first place and understand why it is such a political beast.

A Schema Is Not Necessarily an XML Schema

Semantically, a schema is a diagrammatic presentation, a structured framework, or a plan. Conceptually, you can think of a schema as being a set of rules used to define the structure of data, but that's not the most accurate definition. Strictly speaking a set of rules would provide a very granular level of control that can be quite difficult to achieve.

The best way to understand what a schema is is to think of it like grammar. A schema is a set of requirements that need to be met in order for a document or set of data, which is all a document really is, to be a valid expression within the context of that grammar.

For example, take a look at the English language. A formal sentence in English needs to contain a subject and a verb. You can think of that basic constraint on a sentence (that it must contain a subject and a verb) as a schema. If you were to write that as a rule, you could say:

A formal English sentence must contain a subject and a verb.

Now, working within that structure, you can have a sentence as simple as "I am," which contains only two words-"I" being the subject, and "am" being the verb. But you will notice that our grammar rule doesn't specify that a sentence can't contain other words, such as adjectives and adverbs. So we could write "I am tired," or "I am sleeping soundly," and those would both be valid formal sentences because they still contain a subject and a verb.

Looking at this model, it's easy to see how grammar can become complex very quickly. For example, what if you wanted to specify that a sentence could have more than one subject? What if you wanted to specify the use of direct objects or indirect objects? You could quickly and easily outgrow your simple grammar.

That's what schema design is all about: designing the best grammar to describe your document or data. It is really an art more than a science, much like computer programming. And it is an area that inspires debate. There are camps that believe in the grammar model, and there are camps that believe in a rules-based model. A grammar model lends itself to more flexibility, in both design and interpretation, while a strict, rules-based system offers more precision, and might be easier to implement.

Grammars express structure while still allowing for a flexible data model. However, strict rules can have valid applications as well. When you need generic guidelines for forming sentences, a grammar works great. But what if you wanted to say that any time the subject of a sentence was "Steve" then the verb had to be "went"? That would be best accomplished with a rule. (Not "Steve drove to the store" but "Steve went to the store.") Now, if the subject were "Mary", we could still say "Mary drove to the store," because that doesn't violate the rule. The level of granularity is specific enough that a loosely formed grammar would not be as restrictive as a rules-based system would be.

These are the types of issues that surround the world of XML and the debate around XML Schema. As we delve into the world of schema, and as you begin to write your own, you will need to address these issues in order to get the most out of XML. XML Schema do provide a mechanism for defining grammars, and some mechanisms for rules as well. But if strict rules are what you are after, XML Schema might get you into some tricky situations.

How Schema Relate to Document Type Definitions

In order to take advantage of XML Schema, you should already be familiar with the basics of XML. You should know about elements and attributes, and you should be able to write your own well-formed XML documents.

You may or may not already be familiar with the concept of validation. XML documents can be validated using a Document Type Definition, or DTD. In fact, DTDs are actually a type of schema. A DTD is not actually a separate document, even though most people use a DTD as an external subset. To the XML parser, however, the DTD is still part of the XML document, specifically the part that allows you to specify the structure of the document. Within the DTD you can define your elements, the attributes for those elements, and other facets of your XML document, such as how many times an element can occur.

Document Type Definitions are a holdover from the days of SGML. However, DTDs when used with XML are more limited than what can be accomplished using DTDs with SGML. DTDs basically provide a mechanism for specifying elements, attributes, entities, and notations. Now, that does not mean that DTDs are not useful. In fact, there are many applications of XML where DTDs are more than powerful enough....

Read More

Customer Reviews

Average Review:

Write a Review

and post it to your social network

     

Most Helpful Customer Reviews

See all customer reviews >