Read an Excerpt
Chapter 2: Datatype Basics
The basic benefit of XML -the ability to describe one's own vocabulary - is greatly enhanced by the use of XML Schema datatypes. These can ensure for example: that numeric data is really numeric, strings are a specific format, otherwise validate the format and/or value of an element or attribute. Pre-defined XML Schema datatypes make provisions for various forms of commonly used values such as dates, times, and URI references, as well as providing the basis for more complex and user-defined data structures.Strong data typing and the ability to create modern object-oriented (00) structures are imperative for most of the newer uses of XML (such as SOAP or ebXML). These new applications can now use most of the datatypes used in traditional programming languages, plus the conceptual and maintenance benefits of 00 inheritance of datatypes structures.
The use of strong data typing has advantages beyond the description and validation of documents and web pages. Once web sites serve pages in XML, rather than HTML, web spiders will be able to extract much more meaningful information from these sites. For example:
- Numeric datatypes allow price comparison services that can calculate currency conversions, taxes, and/or multi-item costs.
- Users searching for date-sensitive items (like newspaper articles or a specific event) can use standardized dates, and search for specific dates or ranges of dates.
- Type-specific searching can also apply to other specific datatypes such as URls and user- - derived datatypes such as ISBNs, UPCs, and part numbers.
First, we will look at the basic principles of schema datatypes, and then we will look at the two dozen or so built-in datatypes provided as pan of XML Schema.
Datatypes in XML - An Overview
XML 1.0 and its DTDs provided a few simple datatypes, but none were numeric types, and validation mechanisms quite limited. There have been proposals to add some additional type checking to DTDs (such as DT4DTD), but these are beyond the scope of this book. Early schema proposals such as SOX and XMLData provided various sets of pre-defined types, which informed the development of the W3C Schema Recommendation. The lack of strong data typing was one of the principle reasons for the development of XML Schema. Indeed, datatypes are so significant that they comprise half of the XML Schema specification, and they may be used independently from the rest of the XML Schema specification.
XML Schema datatypes are defined in XML Schema Part 2: Datatypes, which became a W3C Recommendation in May 2007. It is available at http://www.w3.org/TR/xmlschema-2
.
These datatypes are based upon those in XML 1.0 DTDs, Java, SQL, the ISO 11404 standard on languageindependent datatypes, existing Internet standards, and earlier schema proposals.
It would be useful to have a link to an online version of the ISO 77404 standard, but like most ISO documents, it is only available as expensive paper. You canfind ordering information for this at http.//www.iso.ch/cate/d19346.html
.
In the last chapter, we saw how we could use the XML Schema built-in datatypes, such as s tring and integer, in our element declarations, for example:
<element name = "FirstName" type = "string" />
We also saw how we could create our own types rather than those from XML Schema, using the complexType element, like this:
<element name = "Customer"> <complexType> <sequence> <element name = "FirstName" type = "string" /> <element name = "MiddleInitial" type = "string" /> <element name = "LastName" type =-"string" /> </sequence> </complexType> </element>
Complex types and simple types are defined in Part 1 of the XML Schema specification (http://www.w3.org/TR/xmlschema-1
). These concepts are about defining structures in your schemas. Here is a: quick reminder of the difference between simple and complex types:
O simple types - a simple string that doesn't contain any child elements, but might be constrained: to be numeric or otherwise specially-formatted (attribute values are always simple types)
complexType
and other schema constructs, but for this chapter, we'll be focusing on the set simple datatypes provided for us
by
XML Schema. Before we get stuck into the details of the different datatypes, let's spend a bit of time reviewing
the
basic ideas behind XML Schema datatypes in general.
Properties of XML L Schema Datatypes
All datatypes are composed of three parts:- A value space - the set of distinct and valid values, each corresponding to one or more string representations (for example, the number 42 is a single value)
A lexical space - the set of lexical representations, that is, the string literals representing values (for example, any of the strings "42" or "forty-two" or "0.42E2" or even "0.42 102" could represent the value of 42)
- A set of facets - the properties of the value space, individual values, and/or lexical items
DateAdmission
).
<State> <Name>Wyoming</Name> <Population>469557</Population> <DateAdmission>1890-07-10</DateAdmission> </State>
In the Name element, the value and lexical spaces are identical - the value of a string is the same as its lexical representation.
On the other hand, the Population element is represented in XML as a string, but its value is the mathematical concept of "four hundred and sixty nine thousand, five hundred and fifty seven". The string 469557 in the above example is just one possible lexical representation. We could also have used 469557.0 or 4695.57e2 to represent the same value.
The DateAdmission element is also represented as a string, like all elements in XML. This one conforms to an international (ISO) standard, and represents a value of July 10th, 1890. ISO dates are similar to the common data processing or Japanese format preference (yyy-mm-dd). We will look at this and other built-in derived datatypes in the next chapter.
All comparisons, calculations, ordering, and the like are generally applied to the value of the datatype. There may be several alternative lexical representations for a given value.