DB/2 Notes

I spent a lot of time making DB/2 work on Linux, and trying to make DB/2 work with the Network Surveyor dataset. I've put these notes together for others who have limitied experience with DB/2 and would like to see the issues we/I ran into.

I have a fair amount of experience building parallel database systems, several (2-3) of them as a matter of fact. Some were research tools, but one was a full SQL standard compliant production system with multi-terabyte capacity.

My Comments on DB/2

Getting DB/2 to work with the Surveyor data set was quite interesting. It was somewhat disappointing to discover that DB/2 was't as polished in some regards as those systems.

The Good thing about DB/2 is that David knows lots of people there, and we could try asking them on how to track down the problem. If we were stuck with their normal communication channels we would have not gotten anywhere.

It was amazingly slow to try and dig through the DB/2 documentation to try and track down things. I'm talking about spending hours to fix one problem, and then needing to move on only to spend more hours on the next. Sometimes the documentation did not address the issue at all. IBM's technique of "emailing using a web page", instead of real email, is a ridiculous method of communication. And they kept on harping for you to use the stupid web page instead of being able to communicate as civilized people.

Admittedly some problems were of our own doing, such as the trying to track down the licenses.

A good thing about DB/2 is that it is more-or-less a production quality database. That means it actually logs info about what it is doing, so you can ask people about it and have a chance of tracking down problems. The bad thing about this is that the logs often don't provide enough info to let you fix the problem yourself. Without contacting IBM to find out what the error really means.

The Surveyor workload and tables are a bit different than what you would find in a typical relational system. For example our largest table, the dreaded multi-terabyte Times data, only has payload entries of 16 bytes of data. Compared to the 100-200 byte (or more) rows that you would find in a conventional SQL application. Trying to load this table we ran up against the physical limitations of the DB/2 database, such as records per page and pages, which caused a lot of problems.

Another problem with DB/2 is that it does not allow for replicated tables across multiple physical nodes. The Surveyor "meta-information" tables are relatively small. It would make query performance improve substantially if they could exist on every node in the system. Without that mirroring, data has to be shipped inter-node for many queries which could just run locally.

The lack of global command-control capabilities was interesting. So, there is this parallel database, which is really a collection of independent communicating SQL database systems. The problem is that if you would like to change a database configuration option, you need to perform that change individually on each node. There is no way to globally issue a query to configure the database.

Another lack in DB/2 was that of distributed catalogs. This is a parallel database which relies upon NFS-mounting one node on all the other nodes so that catalog data can be accessed!

Loading Data

db2 load doesn't work with DB/2 EEE (Enterprise Extended Edition) because the load is per-node, not global. In EEE it expects a split file with hash info, not a file you can load.
db2 import works OK, but is horribly slow and requires a big transaction log. I believe it essentially does insert into _table_ values(...).
Use the autoloader db2atld. It is quite fast and doesn't require a lot of log space. However, if you have a primary key on a table, it will still need to rebuild the index after every autoload to ensure consistency.
If you have a table with a PRIMARY KEY watch out! The index which implements the primary key needs to be rebuilt every time you complete a insert or bulk load. If your table is huge this can use more resources than you have.
The solution is to create and load the table with the PRIMARY KEY and then ALTER TABLE to add the key after the data is loaded.
To load from a non-CSV file use modified by coldelX, where X is the delimiter character. For example, a file which uses a column delimiter of "|":
db2 import _file_ of del modified by coldel| insert into _tablename_

To fix a table that has a hung load

db2 load /dev/null of del terminate into _tablename_

To get info about tablespaces in a connected db:
list tablespaces show detail

Configuring the System

When updating DBM CFG or DB CFG parameters there are a few things to keep in mind:
1. Each node in a parallel machine will need to be configured seperately -- administration isn't global.
2. you will need to stop and restart the database instance for almost every config change to take effect. However, the good news is that you can do this globably after you have made all the local changes
To configure a db2 instance to put its files in a different location you need to update the db config:
update dbm cfg using DFTDBPATH _path_
To change the log configuration on a db/2 database, you will need to do the following: This example sets the log to 16GB (4096 byte pages x 64000 pages x 16 logs):
update db cfg for _database_ using logfilsiz 64000 update db cfg for _database_ using logprimary 64

To create a named buffer pool...

CREATE BUFFERPOOL _name_
	ALL NODES
	SIZE	_in_pages_
	PAGESIZE _of_pages_

To create a tablespace to use different size pages:

CREATE TABLESPACE _name_
	PAGESIZE	_of_pages_
	MANAGED BY SYSTEM
	USING ('name_of_file_in_db2_file_tree_for_db')
	BUFFERPOOL _bufferpool_name_

To create a tablespace managed by the database:

CREATE TABLESPACE _name_
	[PAGESIZE	_of_pages_]
	MANAGED BY DATABASE
	USING (file 'name_of_file_in_db2_file_tree_for_db' _size_)
	[BUFFERPOOL _bufferpool_name_]

By default tables and tablespaces live in disk space that is MANAGED BY SYSTEM. Typically this means the database info is put into OS files. If you would like the database to manage the space, performing its own allocation, you can do that with MANAGED BY DATABASE.
The USING (FILE '_filename_' _size_) stanza is used to tell the system what file to use, and what its maximum size should be. _size_ can be expressed in bytes directly, or you can use the appropiate K, M, or G suffix to donate Kilobytes, Megabytes, or Gigabytes.

DB/2 Limitations

DB/2's record ID is a 32 bit type; 24 bits of that are a Page ID, and 8 bytes are a Record ID. That means that regardless of size, page can only contain 256 records! If your records are small, larger pagesizes don't help you any.
It is possible to work-around the RID limitation by expanding the number of physical computer nodes in your system. That way you can have fewer records on each machine, and avoid hitting the per-relation RID limit. Unfortunately, that would have meant quintupling (5x) the node count to make the per-node RIDs fit. In other words, we have 4 boxes ... we would need 20 for our table to work correctly. 16 additional computer systems don't fall out of the sky. We discussed with IBM trying to run multiple DB/2 nodes per physical node. We were concerned with the overhead of having 5 DB/2 installations per node. So was IBM. We never tried that configuration to see what the performance would be, since it would take a bunch of work just to see what would happen.
If you use the default MANAGED BY SYSTEM space, something funky happens with record IDs, and you can actually store more than 2^24 pages with of data. The above limitation does apply if you try specifiying the page size of a tablespace, whether in DATABASE or SYSTEM managed storage. It can specify a file or a device. If a relative name is specified db/2 puts it in an appropriate location for that database.
The amount of data you can put into a db/2 index is dependent upon the page size. If you have a large record count, you need to use a larger page size so the index can store all the data.
Given the previous item however, it is possible to create a file in the funky default space that you can't create an index for! Sigh.
The autoloader, db2atld has one serious problem which prevents it from being used in an efficient manner. You can setup autoload to split the data across the network and store the split data on the machine it will be loaded on. That is well and great. However you can't have the autoloader load the data on each node from the split file located there! Instead, the split file has to be written back to a shared location. Then it has to be read from disk and passed across the network to its ultimate destination. And of course, the shared disk is NFS, so the traffic goes across it when the split file is stored.
Yes, it works, but with just a little bit of work it could be so much better! If you are performing partially declustered splitting this is less of an issue, since the data needs to be shared. However, when you are running fully declustered it is typically because you have so much data you need/want to.

Bolo's Home Page

Last Modified: Fri Dec 5 15:11:37 CST 2003

Bolo (Josef Burger) <bolo@cs.wisc.edu>