Professional XML Schemas

Professional XML Schemas

by Jon Duckett, Kevin Williams, Kurt Cagle, Stephen Mohr
     
 

This book is for all professional XML programmers who need to use XML Schemas to define data and need a practical guide to this new standard.See more details below

Overview

This book is for all professional XML programmers who need to use XML Schemas to define data and need a practical guide to this new standard.

Product Details

ISBN-13:
9781861005472
Publisher:
Wrox Press, Inc.
Publication date:
07/01/2001
Series:
Professional Ser.
Pages:
800
Product dimensions:
7.28(w) x 9.04(h) x 1.56(d)

Read an Excerpt

Chapter 2: Datatype Basics

The basic benefit of XML -the ability to describe one's own vocabulary - is greatly enhanced by the use of XML Schema datatypes. These can ensure for example: that numeric data is really numeric, strings are a specific format, otherwise validate the format and/or value of an element or attribute. Pre-defined XML Schema datatypes make provisions for various forms of commonly used values such as dates, times, and URI references, as well as providing the basis for more complex and user-defined data structures.

Strong data typing and the ability to create modern object-oriented (00) structures are imperative for most of the newer uses of XML (such as SOAP or ebXML). These new applications can now use most of the datatypes used in traditional programming languages, plus the conceptual and maintenance benefits of 00 inheritance of datatypes structures.

The use of strong data typing has advantages beyond the description and validation of documents and web pages. Once web sites serve pages in XML, rather than HTML, web spiders will be able to extract much more meaningful information from these sites. For example:

  • Numeric datatypes allow price comparison services that can calculate currency conversions, taxes, and/or multi-item costs.

  • Users searching for date-sensitive items (like newspaper articles or a specific event) can use standardized dates, and search for specific dates or ranges of dates.

  • Type-specific searching can also apply to other specific datatypes such as URls and user- - derived datatypes such as ISBNs, UPCs, and part numbers.
Existing free-text searches can't differentiate the May Company, May Day, the merry month of May, a place called May, or a person's name. Nor can these searches ignore the many appearances of the permissive verb "may" which is rarely the target of a search, and often included in the "stop words" list (terms ignored when searching). The use of XML Schema datatypes will permit much more focused searching, reducing the huge lists of online search engine results. Type-specific searching is an awesome benefit of XML Schema s strong data. typing.

First, we will look at the basic principles of schema datatypes, and then we will look at the two dozen or so built-in datatypes provided as pan of XML Schema.

Datatypes in XML - An Overview

XML 1.0 and its DTDs provided a few simple datatypes, but none were numeric types, and validation mechanisms quite limited. There have been proposals to add some additional type checking to DTDs (such as DT4DTD), but these are beyond the scope of this book. Early schema proposals such as SOX and XMLData provided various sets of pre-defined types, which informed the development of the W3C Schema Recommendation. The lack of strong data typing was one of the principle reasons for the development of XML Schema. Indeed, datatypes are so significant that they comprise half of the XML Schema specification, and they may be used independently from the rest of the XML Schema specification.

XML Schema datatypes are defined in XML Schema Part 2: Datatypes, which became a W3C Recommendation in May 2007. It is available at http://www.w3.org/TR/xmlschema-2.

These datatypes are based upon those in XML 1.0 DTDs, Java, SQL, the ISO 11404 standard on languageindependent datatypes, existing Internet standards, and earlier schema proposals.

It would be useful to have a link to an online version of the ISO 77404 standard, but like most ISO documents, it is only available as expensive paper. You canfind ordering information for this at http.//www.iso.ch/cate/d19346.html.

In the last chapter, we saw how we could use the XML Schema built-in datatypes, such as s tring and integer, in our element declarations, for example:

<element name = "FirstName" type = "string" />

We also saw how we could create our own types rather than those from XML Schema, using the complexType element, like this:

<element name = "Customer">
	<complexType>
		<sequence>
		<element name = "FirstName" type = "string" />
		<element name = "MiddleInitial" type = "string" />
		<element name = "LastName" type =-"string" />
		</sequence>
	</complexType>
</element>

Complex types and simple types are defined in Part 1 of the XML Schema specification (http://www.w3.org/TR/xmlschema-1). These concepts are about defining structures in your schemas. Here is a: quick reminder of the difference between simple and complex types:

O simple types - a simple string that doesn't contain any child elements, but might be constrained: to be numeric or otherwise specially-formatted (attribute values are always simple types)

  • complex types - element values that contain other elements or have attributes, and can be constrained in a similar fashion to simple types The second part of the specification independently defines the set of built-in datatypes. These are all simple types. In the next chapter we'll move on to see how we create our own complex content models using complexType and other schema constructs, but for this chapter, we'll be focusing on the set simple datatypes provided for us by XML Schema. Before we get stuck into the details of the different datatypes, let's spend a bit of time reviewing the basic ideas behind XML Schema datatypes in general.

    Properties of XML L Schema Datatypes

    All datatypes are composed of three parts:
    • A value space - the set of distinct and valid values, each corresponding to one or more string representations (for example, the number 42 is a single value)

      A lexical space - the set of lexical representations, that is, the string literals representing values (for example, any of the strings "42" or "forty-two" or "0.42E2" or even "0.42 102" could represent the value of 42)

    • A set of facets - the properties of the value space, individual values, and/or lexical items
    To illustrate the difference between lexical and value spaces, we'll look at a snippet of XML data where the first child element (Name) is declared to be a string datatype, the second (Population) uses the decimal datatype, and the third is a date datatype (DateAdmission).

    	
    <State>
    <Name>Wyoming</Name>
    		<Population>469557</Population>
    <DateAdmission>1890-07-10</DateAdmission>
    </State>
    

    In the Name element, the value and lexical spaces are identical - the value of a string is the same as its lexical representation.

    On the other hand, the Population element is represented in XML as a string, but its value is the mathematical concept of "four hundred and sixty nine thousand, five hundred and fifty seven". The string 469557 in the above example is just one possible lexical representation. We could also have used 469557.0 or 4695.57e2 to represent the same value.

    The DateAdmission element is also represented as a string, like all elements in XML. This one conforms to an international (ISO) standard, and represents a value of July 10th, 1890. ISO dates are similar to the common data processing or Japanese format preference (yyy-mm-dd). We will look at this and other built-in derived datatypes in the next chapter.

    All comparisons, calculations, ordering, and the like are generally applied to the value of the datatype. There may be several alternative lexical representations for a given value.

    Value Spaces

    Each datatype has a range of possible values. These value spaces are implicit for many datatypes. For example, a floatingpoint number can range from negative to positive infinity. A string can contain any finite-length sequence of legal XML characters. An integer allows a value of zero, or any positive or negative whole number, but wouldn't allow fractional values....

    Read More

  • Customer Reviews

    Average Review:

    Write a Review

    and post it to your social network

         

    Most Helpful Customer Reviews

    See all customer reviews >