DB/2 Notes

I spent a lot of time making DB/2 work on Linux, and trying to make DB/2 work with the Network Surveyor dataset. I've put these notes together for others who have limitied experience with DB/2 and would like to see the issues we/I ran into.

I have a fair amount of experience building parallel database systems, several (2-3) of them as a matter of fact. Some were research tools, but one was a full SQL standard compliant production system with multi-terabyte capacity.


My Comments on DB/2

Getting DB/2 to work with the Surveyor data set was quite interesting. It was somewhat disappointing to discover that DB/2 was't as polished in some regards as those systems.

The Good thing about DB/2 is that David knows lots of people there, and we could try asking them on how to track down the problem. If we were stuck with their normal communication channels we would have not gotten anywhere.

It was amazingly slow to try and dig through the DB/2 documentation to try and track down things. I'm talking about spending hours to fix one problem, and then needing to move on only to spend more hours on the next. Sometimes the documentation did not address the issue at all. IBM's technique of "emailing using a web page", instead of real email, is a ridiculous method of communication. And they kept on harping for you to use the stupid web page instead of being able to communicate as civilized people.

Admittedly some problems were of our own doing, such as the trying to track down the licenses.

A good thing about DB/2 is that it is more-or-less a production quality database. That means it actually logs info about what it is doing, so you can ask people about it and have a chance of tracking down problems. The bad thing about this is that the logs often don't provide enough info to let you fix the problem yourself. Without contacting IBM to find out what the error really means.

The Surveyor workload and tables are a bit different than what you would find in a typical relational system. For example our largest table, the dreaded multi-terabyte Times data, only has payload entries of 16 bytes of data. Compared to the 100-200 byte (or more) rows that you would find in a conventional SQL application. Trying to load this table we ran up against the physical limitations of the DB/2 database, such as records per page and pages, which caused a lot of problems.

Another problem with DB/2 is that it does not allow for replicated tables across multiple physical nodes. The Surveyor "meta-information" tables are relatively small. It would make query performance improve substantially if they could exist on every node in the system. Without that mirroring, data has to be shipped inter-node for many queries which could just run locally.

The lack of global command-control capabilities was interesting. So, there is this parallel database, which is really a collection of independent communicating SQL database systems. The problem is that if you would like to change a database configuration option, you need to perform that change individually on each node. There is no way to globally issue a query to configure the database.

Another lack in DB/2 was that of distributed catalogs. This is a parallel database which relies upon NFS-mounting one node on all the other nodes so that catalog data can be accessed!


Loading Data


Configuring the System


DB/2 Limitations


Bolo's Home Page
Last Modified: Fri Dec 5 15:11:37 CST 2003
Bolo (Josef Burger) <bolo@cs.wisc.edu>