Read an Excerpt
This is an extract from Chapter One of ADO 2.0 Programmer's Reference.
Chapter 1 - What Is ADO?
ActiveX Data Objects (ADO) and OLEDB, its underlying technology, are going to play a big part in the future of data access. Microsoft has unequivocally committed their future to it, and quite rightly so. The paperless office has yet to appear, but the amount of data being stored on computer systems is increasing every day. The rate at which the Web is expanding shows this quite clearly, and that''s just the public face of data. There''s much more data that is hidden from general view.In this chapter we are going to look at the terms and technology behind ADO. This isn''t actually required if you need to start coding straight away, but like any form of learning, your understanding will be better if you have a good foundation.
To give you that good grounding, there are several important topics we will discuss in this chapter:
- What we mean by data
- What we mean by a data store
- How ADO fits with existing data access strategies
- Data access in the client/server world
What is Data?
In a spare few minutes open up Explorer and have a look around your hard disk. Make a little note of how many separate pieces of information you've got; databases, documents, spreadsheets, email messages, HTML and ASP documents, and so on. Quite a lot, eh? They are all data, albeit stored in different forms. This might seem obvious, but traditionally data has only been thought of as information that is stored in a database. To build a business application the data had to be in a database. Whilst that might be true for a large proportion of existing data, why should the remaining data be excluded? In fact, as computers get more powerful, is the concept of 'data' is starting to include multimedia items, such as music and video, as well as the more normal document-based data.So, by data we mean any piece of information, whatever its contents. Whether it's your address book, your monthly expenses spreadsheet, or a pleading letter to the IRS, it's all data.
What Are Data Stores?
Knowing what we consider data to be might make a Data Store obvious - it's somewhere where data is kept. However, there is much more to Data Stores than you first might think. Instead of looking at your hard disk, let's look at mine this time, and see what I've got installed:
- Databases: Both Access and SQL Server, which we consider to be traditional data stores. I have everything from accounts and invoicing to sample databases for books.
- Spreadsheets: Financial data with year end figures for my tax returns and bills.
- Mail and News: I use Outlook and Outlook Express to handle my mail and Internet news.
- Documents: This is the largest proportion of data on my machine, containing all of my personal letters and documents, and chapters for books (including this one).
- Graphics: Screen dumps and pictures for books.
- Internet: HTML and ASP pages, containing samples and applications.
- Reference Material: Including MSDN and encyclopedias.
You could even include my CD-ROM drive and tape backup unit. The CD-ROM uses the standard documents and folders format, so this could be considered part of the file system data store, but the tape backup has its own format, and could therefore be considered a data store too.
There are also numerous other data stores, from mainframe file systems to databases and mail. As the enterprise gets bigger we also need to include user account databases, and other machines attached to the network, such as printers and scanners. They might not all be data stores themselves, but as items of data they'll be in a data store somewhere- such as NT5.0's Active Directory Services Interface.
About Universal Data Access
Universal Data Access (UDA) is Microsoft's strategy for dealing with all of this data. It's aimed at providing high performance access to a variety of data sources. The cynical amongst you might think it's another attempt to shoehorn in another Microsoft technology, but let's look at the modern business.To be a successful business you have to be flexible, to adapt to change. How do you know when to change? There's no simple rule here, but most companies make decisions by asking a few questions. How much can we sell? How much are we selling? How much are our competitors selling? What does research show customers want? Those sorts of questions can be answered by figures, and where do the figures come from? That's right, data. But we've already seen that data is stored in many different ways, and there is no single way of accessing them all.
So with UDA we are looking at an easy-to-use methodology that is intended to allow access to multiple sources of data with a single method. Build in high performance and support for existing data access methods, and you're on your way to something that could make a real difference, because you don't have to spend the time and resources you would need to' bundle together all of your data into a single data source. Let's have a look at how UDA streamlines your data access.
When building an application you can make sure it uses ADO for its data access, and ADO will talk to all of the data sources required. This means that programming is made easier, since only one technology needs to be learnt. As ADO will give fast, transparent access to many different data sources, there's no reason to use any other method.
You can clearly see what Microsoft are aiming for when you look at the three main design goals for the Data Access Components:
- Meeting the key customer requirements, such as performance, reliability, and broad industry support.
- Giving access to the widest range of data sources, through a common interface.
- Providing an easy migration path for existing data access technologies.
This method contrasts with Oracle who, naturally, are pushing Universal Server, where all of the data will be stored under one central data store. The ultimate aim is the same - broader access to data, but the Oracle approach initially involves more data conversion and translation as the data is imported to the store. From that point on however, your data is easier to access.
Neither of these two methods is clearly the best, as your choice of data access method depends very much upon your business needs and legacy computer systems.
Existing Technologies
Before we explain why ADO came about, let's take a quick look at some existing technologies and see how they fit into the picture.
- DBLib: This is the underlying technology for connecting to SQL Server, and is primarily designed for C, but is often used in Visual Basic. Because it is specific to SQL Server, it is extremely fast and functional. For this very reason, however, it doesn't allow access to any other source of data.
- ODBC: Open DataBase Connectivity, was the first step on the road to a universal data access strategy. It was designed by Microsoft and other DBMS vendors as a cross-platform, database-independent method, using API calls. Although it was designed for multi-database use, it was very often only used on a single database, and from a programmer's point of view, was complex to use.
- DAO: The Data Access Objects were introduced with Microsoft Access, and provided a strict hierarchical set of objects for manipulating data in Jet and other ISAM databases. These objects were available with Visual Basic and quickly became the most used data access method for Visual Basic programs. DAO also had the advantage of being able to sit on top of ODBC, thus allowing it to communicate with many different databases, although, because the technology is optimized for use with JET, this is quite slow.
- RDO: Intended as the successor to DAO for Visual Basic programmers, Remote Data Objects is a thin layer that sits on top ODBC, allowing better access to server databases, such as SQL Server. This brought the flexibility of ODBC with a much easier programming model, but, like DAO, it has a strictly hierarchical programming model.
- ODBCDirect: An extension to DAO, which combines DAO and RDO. It allows programmers to use the DAO programming model, but access ODBC data sources without having the Jet database loaded.
- JDBC: Java Database Connectivity was designed by Microsoft as another DBMS neutral API, especially for use in Java applications.
They are also more or less constrained by providing access to relation databases, although Microsoft Excel and simple text documents could also be used as data sources when using ODBC. ODBC drivers have also been produced for object oriented and hierarchical databases, which allow the data to be accessed in a way similar to a relational database.
What Is OLEDB?
OLEDB is designed to be the successor to ODBC, but you might be asking why we need a successor? Well, there are three main trends at the moment. The first, fairly obviously, is the Internet. The second is an increasing amount of data being stored in a non-relational form, and the third is Microsoft's desire for a COM world, where all object usage is handled through their Component Object Model.The Internet brings a range of new challenges to standard data access because of its distributed nature. Applications are now being written on a truly global scale, and you can no longer guarantee that the data you are accessing is stored on your local network. This means that the way you access data has to be carefully considered, as well as the type of data you access. The new business opportunity of e-commerce has meant that selling becomes a whole new ball game - you can now have an application that shows pictures of your products, plays music, and even videos, all running over the Web. The Web is also more distributed (and often less reliable) than conventional networks, so your data access method has to take this into account. You can't, for example, assume that your client and server remain connected at all times during an application - in the stateless nature of the Web this doesn't make sense.
So OLEDB is a technology designed to solve some of these problems, and over time it will gradually replace ODBC as the central data access method. Even so, new versions of ODBC are supplied with ADO 2, and ODBC will continue to be developed and supported. 'OLEDB is, and with the new version 3.5 of ODBC, OLEDB is now the guts of the new data access strategy. '
Why ADO?
OLEDB is a COM based set of object oriented interfaces, and thus for a large proportion of the programming community it is too complex to use, or does not map onto their particular programming language, especially scripting languages. ADO is the higher level model that most people will use. It equates fairly well to the DAO level, where you create an object, and call its methods and properties. Being an ActiveX component means it can be used from any language that supports COM - including Visual Basic, VBA, scripting languages, VC++ and VJ++.So now our diagram looks even more enticing:
We have various languages, all with the ability to use a central data access strategy. Some languages can talk directly to OLEDB - as well as talking to the easier ADO - although we'll only be looking at the ADO layer in this book.
ADO also increases the speed and ease of development by introducing an object model that allows data to be retrieved from a data source with as little as one line of code. And this line of code can be switched between languages with minimal changes.
Another pressing need for ADO is the increasing use of the Web as an application medium. Conventional applications are generally connected via a Local Area Network to their data store. They can open a connection to the data, and keep that connection open throughout the life of the program. Consequently, the data store knows who is connecting. Many applications also process data on the client - perhaps a set of records that the user is browsing through or updating. Since the client is connected to the server, or source of the data, there's no trouble updating the data.
On the Web however, the native mechanism is stateless. There's no permanent connection between client and server. If you think about the way the Web works for a second, you'll realize why this is a problem:
1. You request a web page in your browser. 2. The web server receives the request, runs any server-side script, and sends the page back to you.
That's it. As soon as this is over the web server forgets about you. Admittedly with ASP you can store some sort of session state, but it's not very sophisticatedeven with session data each request to a page is essentially a separate program. How then, with this disconnected network, do you provide a system that allows data to be updated on the client and sent back to the server? This is where ADO comes in, with disconnected recordsets. They allow the recordset to be disassociated from the server, and then re-associated at a later date. Any updates can be saved in the client copy, and the server can be updated at a later date. Add this to the fact that you can save these disconnected recordsets locally, you can have an application that lets the user change records, and any time later, update the central set of records.
The use of client-side data manipulation also allows you to sort data, find records, and generally manage recordsets without resorting to a trip back to the server. Although this idea primarily fits with the nature of the Web, it can work just as well for the standard type of applications running on a LAN, and can reduce network traffic.
Data Providers and Data Consumers
OLEDB introduces two new terms that help to explain how OLEDB and ADO fit together a little more clearly. A Data Consumer is something that uses, or consumes, data. Strictly speaking, ADO is actually a consumer, because it uses data provided by OLEDB.A Data Provider is something that, unsurprisingly, provides data. This isn't the physical source of the data, but the OLEDB mechanism that connects us to the physical data store. The provider may get the data directly from the data store, or it may go through a third party product, such as ODBC, to get to the data store. It can even use other data sources and provide added value along the way - the MSDataShape provider is an example of this. This means that you can immediately start using ADO to access existing data stores that support ODBC, even if there isn't an OLEDB provider available for them yet.
The initial set of OLEBD providers supplied with ADO 2 are:
- Jet 3.51, for Microsoft Access databases. This allows access to standard Access databases, including linked tables.
- Directory Services, for resource data stored, such as Active Directory. This will become more important when NT5 is available, as the Directory Service in NT5 will allow access to user information, as well as network devices.
- Index Server, for Microsoft Index Server. This will be particularly useful as web sites grow, as the indexed data will be available.
- Site Server Search, for Microsoft Site Server. Again for use with web sites, especially large complex sites, where Site Server is used to maintain them.
- ODBC Drivers, for existing ODBC Drivers. This ensures that legacy data is not omitted as part of the drive forward.
- Oracle, for Oracle databases. Connection to Oracle has never been particularly easy with Microsoft products in the past, but a native driver will simplify access to existing Oracle data stores.
- SQL Server, for Microsoft SQL Server, to allow access to data stored in SQL Server.
- Data Shape, for hierarchical recordsets This allows creation of master/detail type recordsets, which allow drilling down into detailed data.
- Persisted Recordset, for locally saved recordsets. This isn't strictly a data provider, but acts as one for recordsets that have been saved locally.
OLEDB also provides a few other services, such as a query processor and a cursor engine, so these can be used at the client. There are two reasons for this. The first is that it frees the actual provider from providing the service, so that the service can be smaller, and secondly it makes it available as a client service. This means that cursor handling can be provided locally, which is an important point of disconnected recordsets and Remote Data Services.
Distributed Internet Applications
Distributed interNet Applications, or DNA, is the architectural strategy that Microsoft are defining as the ideal way to write distributed, n-tier, client-server applications. One of the interesting things about this strategy is that it's really just a set of ideas and suggestions, rather than a complex, locked-in solution. It's not a new idea, but there are now some really good tools that make creating this type of application, not only a reality, but also a relatively easy reality.The basic premise is that you partition your application into at least three tiers. The first is the user interface tier - this is what the user sees, and could be a Web page or another type of application, written in any language. The second tier is where the business rules or processes lie - these determine where the data comes from, what rules apply to the data, and how it should be returned to the user interface. The third tier is the data layer - the actual source of the data.
The great thing about DNA is that it aims to be language independent, since it all hinges around COM (and COM+ when that arrives with NT5). COM is the Component Object Model that allows easy object creation and re-use. Any component that supports COM will fit into this picture, so you can write your application in ASP and JavaScript, Visual Basic, VC++, VJ++ , Delphi, or any language that supports COM. Likewise, your business components can be in any language, so you can program in whatever environment you feel happy in. There's no need to learn a new language.
In the Business Processes tier you see that IIS and MTS are mentioned. IIS supplies the connection from Web applications to components, which could be stand-alone components or MTS components. The great advantage of MTS is that it makes your middle tier suddenly very easy to manage - you can just create a component and install it into MTS. Now it suddenly becomes available to all applications that can call COM objects, without having to worry about the nasty registration process that you had to do previously. You can access the components on remote machines using DCOM. You also get the added advantage of transaction processing and easy scalability, without having to program it in yourself. The user interface can then be built in any language that supports COM.
ADO 2 Availability
The Microsoft Data Access Components are available as a separate download from the Microsoft Web site, at www.microsoft.com/data, as well as being supplied with Visual Studio 6. Future versions of ADO are to be supplied with Internet Explorer 5, which ships with ADO 2.1, and Microsoft Office 2000. These future versions will allow greater support for XML, especially with the ability to transfer and save recordsets as pure XML data, and bind HTML elements to XML data.