Notes on Binary Search Trees

Code for binary search trees may be found here.

What is a binary search tree?

A binary search tree is a binary tree with the following properties:

The data stored at each node has a distinguished key which is unique in the tree and belongs to a total order. (That is, for any two non-equal keys, x,y either x < y or y < x.)
The key of any node is greater than all keys occurring in its left subtree and less than all keys occurring in its right subtree.

We have already seen several examples of binary search trees. For example:

           (8)
	  /   \
       (2)     (21)
      /   \     /         
    (1)   (5) (13)
          /
        (3)

Here the values stored as each node are themselves the keys (in this case integers). One property of a binary search tree is that an in-order traversal walks over the nodes in order of their keys (thus the name in-order). Data maintained in a binary search tree is sorted by the key.

We can emulate a priority queue as long as the priorities are unique:

        (homework, 2)
         /         \
(cs367, 0)         (Bagders, 3)
         \
     (clean room, 1)

Also notice how we can store more than just a key at each node.

Unlike a binary tree which is a data structure, a binary search tree is an ADT. That is, to use it, we don't need to know how it is represented.

The following are different binary trees:

   (7)                  (5)
   / \                  / \
(4)   (9)      vs.   (4)   (9)
  \                        /
  (5)                    (7)

but they represent the same binary search tree.

Operations can we perform a BST include:

insert --- add an item and its key to the BST
search --- look up an item in the BST by its key
remove --- delete an item/key from the BST by its key

We can also test if the tree is empty, count how many values are stored in the tree and inquire as to the height of tree. Other functions may also be useful (like finding the smallest or largest elements in the tree) but they are not essential to what a BST is.

For example, we'd like to be able to build the above tree as something like this:

BST<String, int> bst;  // bst is initially empty

bst.insert("homework", 2);
//        (homework, 2)

bst.insert("cs367",0);
//         (homework, 2)
//          /
// (cs367, 0)

bst.insert("Badgers", 3);
//         (homework, 2)
//          /         \
// (cs367, 0)         (Bagders, 3)

bst.insert("clean room", 1);
//         (homework, 2)
//          /         \
// (cs367, 0)         (Bagders, 3)
//         \
//     (clean room, 1)

Implementing a BST

Of course, a BST's name comes from somewhere --- the intuitive implementation uses a binary tree data structure.

template <class Item, class Key>
class BST {
public:
  ...
  bool search(const Key& k, Item& returnVal) const;
  bool insert(const Item& v, const Key& k);
  bool remove(const Key& v);
  ...
private:
  BinaryTree<IKPair> *root;
  ...
};

So the BST class is really a wrapper around a binary tree. Before looking at detail, let's consider what search does: Use the key to navigate in the tree until we either find the key or hit a pointer to NULL. In the latter case we just wish to signal that no match was found (in this case by returning false). In the former case, we wish to not only signal the match, but also return the value of the item stored with the search key. This can be done by passing in an argument that is a non-constant reference to an Item:

  bool search(const Key& k, Item& returnVal) const;

The navigation can be done recursively roughly as:

if this node is NULL then return false
else if this node's key matches the search key then copy the item at this node and return true
else if this node's key is less than the search key then recursively search the right subtree
else recursively search the left subtree

Insert can uses a very similar navigation through the tree as search. However, for variety we can see how insert can be implemented iteratively:

special case if the tree is empty --- allocate a leaf node and make root point to it
otherwise, make a pointer, t, to the root
While t is not NULL and the key of t is not a match with the insert key:
1. if t's key is less than the insert key then let t point to its right child
2. if t's key is greater than the insert key then let t point to its left child
if a match was not found than allocate a new leaf for the insertion.

Deletion

Remove uses the same navigation through the tree as search, but then must adjust the tree to perform the deletion and to maintain the BST invariant.

There are three cases we need to consider for deletion:

Deleting a leaf --- simply remove it:

       (8)                      (8)     
      /   \                    /   \    
   (2)     (21)             (2)     (21)
  /   \     /     ===>     /   \   
(1)   (5) (13)           (1)   (5) 
      /                        /        
    (3)                      (3)

Deleting a node with one child --- remove it and move its child (the subtree rooted at its child) up:

       (8)                      (8)     
      /   \                    /   \    
   (2)     (21)             (2)     (13)
  /   \     /     ===>     /   \   
(1)   (5) (13)           (1)   (5) 
      /                        /        
    (3)                      (3)

Deleting a node with two children --- swap with the smallest keyed-child in its right subtree, then remove:

       (8)                      (8)     
      /   \                    /   \    
   (2)     (21)             (3)     (13)
  /   \     /     ===>     /   \   
(1)   (5) (13)           (1)   (5) 
      /                       
    (3)

or swap with the largest keyed-child in its left subtree, then remove:

       (8)                      (8)     
      /   \                    /   \    
   (2)     (21)             (1)     (13)
  /   \     /     ===>         \   
(1)   (5) (13)                 (5) 
      /                        /      
    (3) 		     (3)

Running-time complexity of BST operations

search --- in the worst case the search key is not found, but would be located next to the deepest leaf in the tree. Each step takes only a constant amount of work so the algorithm is O(h), where h is the height of the tree.

insert --- in the worst case the insertion takes place at deepest leaf in the tree --- the algorithm is also O(h).

delete --- the worst case is either the same as for search or occurs when the the delete key is found, but that node has two children and either the predecessor or successor of that key is located at the deepest leaf. In either case, the amount of work is bounded by O(h).

So, all the tree operations are proportional to height of the tree. But what is the height in relation to n --- the total number of nodes in the tree? It depends on the shape of the tree (which depends in what order nodes are inserted and deleted).

We might have a completely full binary tree:

      (D)
     /   \
  (B)     (F)
  / \     / \
(A) (C) (E) (G)

Or we might have a linear binary tree (for example, we insert into a binary search tree in sorted order):

(A)
  \
  (B)
    \
    (C)
      \
      (D)
        \
	(E)
	  \
	  (F)
	    \
	    (G)

(or any of the many cases in between.)

Balanced binary trees

A binary tree is balanced if each node has (roughly) the same number of descendants in its left subtree as it has in its right subtree.

Important fact: For balanced binary trees, the height is proportional to the base-two logarithm of the number of nodes in the tree: h = O(lg(n)).

Application: representing sets

Using linear data structures to represent sets: insert, isMember, remove, all O(n)
Using binary search trees to represent sets: insert, isMember (search), remove, all O(h) --- O(lg(n)) if we are lucky.

So using binary search trees to represent sets is asymptotically no worse than lists, and often better. If we can find an efficient way to insure that our BSTs remain balanced then we can do asymptotically better than with lists. We will pick this theme up when we study b-trees. (And also you may wish to investigate it further as a final project.)