Notes on Binary Search Trees

(related reading: Main & Savitch, pp. 470-483)

Code for binary search trees may be found here.

What is a binary search tree?

A binary search tree is a binary tree with the following properties:

We have already seen several examples of binary search trees. For example:

           (8)
	  /   \
       (2)     (21)
      /   \     /         
    (1)   (5) (13)
          /
        (3)

Here the values stored as each node are themselves the keys (in this case integers). One property of a binary search tree is that an in-order traversal walks over the nodes in order of their keys (thus the name in-order). Data maintained in a binary search tree is sorted by the key.

We can emulate a priority queue as long as the priorities are unique:

        (homework, 2)
         /         \
(cs367, 0)         (Bagders, 3)
         \
     (clean room, 1)
Also notice how we can store more than just a key at each node.

Unlike a binary tree which is a data structure, a binary search tree is an ADT. That is, to use it, we don't need to know how it is represented.

The following are different binary trees:

   (7)                  (5)
   / \                  / \
(4)   (9)      vs.   (4)   (9)
  \                        /
  (5)                    (7)
but they represent the same binary search tree.

Operations can we perform a BST include:

We can also test if the tree is empty, count how many values are stored in the tree and inquire as to the height of tree. Other functions may also be useful (like finding the smallest or largest elements in the tree) but they are not essential to what a BST is.

For example, we'd like to be able to build the above tree as something like this:
BST<String, int> bst;  // bst is initially empty

bst.insert("homework", 2);
//        (homework, 2)

bst.insert("cs367",0);
//         (homework, 2)
//          /
// (cs367, 0)

bst.insert("Badgers", 3);
//         (homework, 2)
//          /         \
// (cs367, 0)         (Bagders, 3)

bst.insert("clean room", 1);
//         (homework, 2)
//          /         \
// (cs367, 0)         (Bagders, 3)
//         \
//     (clean room, 1)

Implementing a BST

Of course, a BST's name comes from somewhere --- the intuitive implementation uses a binary tree data structure.
template <class Item, class Key>
class BST {
public:
  ...
  bool search(const Key& k, Item& returnVal) const;
  bool insert(const Item& v, const Key& k);
  bool remove(const Key& v);
  ...
private:
  BinaryTree<IKPair> *root;
  ...
};
So the BST class is really a wrapper around a binary tree.
Before looking at detail, let's consider what search does: Use the key to navigate in the tree until we either find the key or hit a pointer to NULL. In the latter case we just wish to signal that no match was found (in this case by returning false). In the former case, we wish to not only signal the match, but also return the value of the item stored with the search key. This can be done by passing in an argument that is a non-constant reference to an Item:
  bool search(const Key& k, Item& returnVal) const;
The navigation can be done recursively roughly as:
  1. if this node is NULL then return false
  2. else if this node's key matches the search key then copy the item at this node and return true
  3. else if this node's key is less than the search key then recursively search the right subtree
  4. else recursively search the left subtree
Insert can uses a very similar navigation through the tree as search. However, for variety we can see how insert can be implemented iteratively:
  1. special case if the tree is empty --- allocate a leaf node and make root point to it
  2. otherwise, make a pointer, t, to the root
  3. While t is not NULL and the key of t is not a match with the insert key:
    1. if t's key is less than the insert key then let t point to its right child
    2. if t's key is greater than the insert key then let t point to its left child
  4. if a match was not found than allocate a new leaf for the insertion.

Deletion

Remove uses the same navigation through the tree as search, but then must adjust the tree to perform the deletion and to maintain the BST invariant.

There are three cases we need to consider for deletion:
  1. Deleting a leaf --- simply remove it:
           (8)                      (8)     
          /   \                    /   \    
       (2)     (21)             (2)     (21)
      /   \     /     ===>     /   \   
    (1)   (5) (13)           (1)   (5) 
          /                        /        
        (3)                      (3)        
    


  2. Deleting a node with one child --- remove it and move its child (the subtree rooted at its child) up:
           (8)                      (8)     
          /   \                    /   \    
       (2)     (21)             (2)     (13)
      /   \     /     ===>     /   \   
    (1)   (5) (13)           (1)   (5) 
          /                        /        
        (3)                      (3)        
    


  3. Deleting a node with two children --- swap with the smallest keyed-child in its right subtree, then remove:
           (8)                      (8)     
          /   \                    /   \    
       (2)     (21)             (3)     (13)
      /   \     /     ===>     /   \   
    (1)   (5) (13)           (1)   (5) 
          /                       
        (3) 
    
    or swap with the largest keyed-child in its left subtree, then remove:
           (8)                      (8)     
          /   \                    /   \    
       (2)     (21)             (1)     (13)
      /   \     /     ===>         \   
    (1)   (5) (13)                 (5) 
          /                        /      
        (3) 		     (3) 
    

Running-time complexity of BST operations

So, all the tree operations are proportional to height of the tree. But what is the height in relation to n --- the total number of nodes in the tree? It depends on the shape of the tree (which depends in what order nodes are inserted and deleted).

We might have a completely full binary tree:
      (D)
     /   \
  (B)     (F)
  / \     / \
(A) (C) (E) (G)
Or we might have a linear binary tree (for example, we insert into a binary search tree in sorted order):
(A)
  \
  (B)
    \
    (C)
      \
      (D)
        \
	(E)
	  \
	  (F)
	    \
	    (G)
(or any of the many cases in between.)

Balanced binary trees

A binary tree is balanced if each node has (roughly) the same number of descendants in its left subtree as it has in its right subtree.

Important fact: For balanced binary trees, the height is proportional to the base-two logarithm of the number of nodes in the tree: h = O(lg(n)).

Application: representing sets

So using binary search trees to represent sets is asymptotically no worse than lists, and often better. If we can find an efficient way to insure that our BSTs remain balanced then we can do asymptotically better than with lists. We will pick this theme up when we study b-trees. (And also you may wish to investigate it further as a final project.)