Rather than rely on luck, we'd like to be able to guarantee that we can perform the search tree operations in logarithmic time. There are several ways this can be accomplished:
As read in Main & Savitch, we can use B-trees to store the data in a way that is wide rather than deep, and still allows for efficient manipulations. B-trees are not binary search trees --- they aren't even binary trees. In order to implement them we need to start from the ground up with a fresh data structure.
We'd like instead to be able to reuse binary search trees, but in such a way that their height remains logarithmic. The are several ways to do this:
Since the same binary search tree can be represented by binary trees of varying height, there must be someway to transform trees that are imbalanced into ones that are balanced.
Our goal is to reuse the same algorithms for manipulating binary search trees, but in addition, when the tree is changed (i.e. with insert or remove) we apply an additional rebalancing phase. As in:
Modified-insert: 0. usual BST insert 1. rebalance the treeOf course, this will only be useful if the rebalancing is itself efficient (we'll get back to that).
Before proceeding into further details of how to keep binary search trees balanced, let's explore more thoroughly how we can build new, but similar data structures (for example, red-black trees) from existing ones (i.e. from binary search trees). This is our motivation for introducing inheritance.
(2) / \ (1) (5) / \ / \ x x (3) x / \ x x (the x's are external nodes)
It can be proven that the height of a red-black tree is never more than 2*lg(n+1) where n is the total number of internal nodes in the tree. Thus, the height is O(lg(n)).
The intuition: By property 2, no two red nodes can be parent and child. This means on any path from root to an external leaf, there are at least as many black nodes as there are red. Property three says that regardless of which path is taken, the same number of black nodes are encountered. So, in the worst-cast, there is a path that is twice as long as the shortest path. So the height of the tree is no worse than twice the height of the shortest path. That's fairly balanced.
Full code for binary search trees may be found here. Full code for red-black trees may be found here.
template <class Item, class Key> class BST { public: // Constructors BST(); BST(const BST<Item,Key>& source); // copy constructor // Destructor ~BST(); // Constant member functions bool isEmpty() const; size_t size() const; size_t height() const; bool search(const Key& k, Item& returnVal) const; // Modification member functions void operator=(const BST& source); bool insert(const Item& v, const Key& k); bool remove(const Key& v); protected: typedef BinaryTree<pair<Item,Key> > BT; BT *root; // protected member functions Item itemOf(const pair<Item,Key>& ikp) const {return ikp.first;} Key keyOf(const pair<Item,Key>& ikp) const {return ikp.second;} bool isRightChild(BT *node) const; BT *removeNode(BT *node); BT *successor(BT *node) const; BT *_search(BT *t, const Key& k) const; BT *_insert(const pair<Item,Key>& ikp); void _remove(BT *node); };
enum color {RED, BLACK}; template < class Item, class Key > class RBT : public BST<pair<Item, color>, Key> { public: ... bool search(const Key& k, Item& returnVal) const; bool insert(const Item& v, const Key& k); bool remove(const Key& v); private: typedef BinaryTree<pair<pair<Item, color>,Key> > BT; protected: // Protected member functions Item itemOf(const pair<pair<Item,color>,Key>& ick) const; color colorOf(BT *node) const; void setColor(BT *node, color c); BT *rotateLeft(BT *node); BT *rotateRight(BT *node); void fixInsert(BT *node); void fixRemove(BT *x, BT *p); };
template <class Item, class Key> bool RBT<Item,Key>::insert(const Item& v, const Key& k) { BT *node = _insert(pair<pair<Item,color>,Key>(pair<Item,color>(v, RED), k)); if (node != NULL) fixInsert(node); return (node != NULL); }
rotateLeft(x): (x) (y) / \ / \ 'a (y) ===> (x) 'c / \ / \ 'b 'c 'a 'b
rotateRight(y): (y) (x) / \ / \ (x) 'c ===> 'a (y) / \ / \ 'a 'b 'b 'c