Notes on Binary Search Trees
(related reading: Main & Savitch, pp. 470-483)
Code for binary search trees may be found here.
What is a binary search tree?
A binary search tree is a binary tree with the following properties:
- The data stored at each node has a distinguished key which is unique in the
tree and belongs to a total order. (That is, for any two non-equal keys, x,y
either x < y or y < x.)
- The key of any node is greater than all keys occurring in its
left subtree and less than all keys occurring in its right subtree.
We have already seen several examples of binary search trees. For example:
(8)
/ \
(2) (21)
/ \ /
(1) (5) (13)
/
(3)
Here the values stored as each node are themselves the keys (in this case
integers). One property of a binary search tree is that an in-order
traversal walks over the nodes in order of their keys (thus the name
in-order). Data maintained in a binary search tree is sorted by the key.
We can emulate a priority queue as long as the priorities are
unique:
(homework, 2)
/ \
(cs367, 0) (Bagders, 3)
\
(clean room, 1)
Also notice how we can store more than just a key at each node.
Unlike a binary tree which is a data structure, a binary search tree is an
ADT. That is, to use it, we don't need to know how it is represented.
The following are different binary trees:
(7) (5)
/ \ / \
(4) (9) vs. (4) (9)
\ /
(5) (7)
but they represent the same binary search tree.
Operations can we perform a BST include:
- insert --- add an item and its key to the BST
- search --- look up an item in the BST by its key
- remove --- delete an item/key from the BST by its key
We can also test if the tree is empty, count how many values are
stored in the tree and inquire as to the height of tree. Other functions
may also be useful (like finding the smallest or largest elements in the
tree) but they are not essential to what a BST is.
For example, we'd like to be able to build the above tree as something like
this:
BST<String, int> bst; // bst is initially empty
bst.insert("homework", 2);
// (homework, 2)
bst.insert("cs367",0);
// (homework, 2)
// /
// (cs367, 0)
bst.insert("Badgers", 3);
// (homework, 2)
// / \
// (cs367, 0) (Bagders, 3)
bst.insert("clean room", 1);
// (homework, 2)
// / \
// (cs367, 0) (Bagders, 3)
// \
// (clean room, 1)
Implementing a BST
Of course, a BST's name comes from somewhere --- the intuitive
implementation uses a binary tree data structure.
template <class Item, class Key>
class BST {
public:
...
bool search(const Key& k, Item& returnVal) const;
bool insert(const Item& v, const Key& k);
bool remove(const Key& v);
...
private:
BinaryTree<IKPair> *root;
...
};
So the BST class is really a wrapper around a binary tree.
Before looking at detail, let's consider what search does:
Use the key to navigate in the tree until we either find the key or hit a
pointer to NULL. In the latter case we just wish to signal that no match
was found (in this case by returning false). In the former case, we wish to
not only signal the match, but also return the value of the item stored
with the search key. This can be done by passing in an argument that is a
non-constant reference to an Item:
bool search(const Key& k, Item& returnVal) const;
The navigation can be done recursively roughly as:
- if this node is NULL then return false
- else if this node's key matches the search key then copy the item at this node
and return true
- else if this node's key is less than the search key then recursively search
the right subtree
- else recursively search the left subtree
Insert can uses a very similar navigation through the tree as
search. However, for variety we can see how insert can be implemented
iteratively:
- special case if the tree is empty --- allocate a leaf node and make
root point to it
- otherwise, make a pointer, t, to the root
- While t is not NULL and the key of t is not a match with the insert
key:
- if t's key is less than the insert key then let t point to its
right child
- if t's key is greater than the insert key then let t point to its
left child
- if a match was not found than allocate a new leaf for the insertion.
Deletion
Remove uses the same navigation through the tree as search, but then must
adjust the tree to perform the deletion and to maintain the BST invariant.
There are three cases we need to consider for deletion:
- Deleting a leaf --- simply remove it:
(8) (8)
/ \ / \
(2) (21) (2) (21)
/ \ / ===> / \
(1) (5) (13) (1) (5)
/ /
(3) (3)
- Deleting a node with one child --- remove it and move its child (the
subtree rooted at its child) up:
(8) (8)
/ \ / \
(2) (21) (2) (13)
/ \ / ===> / \
(1) (5) (13) (1) (5)
/ /
(3) (3)
- Deleting a node with two children --- swap with the smallest
keyed-child in its right subtree, then remove:
(8) (8)
/ \ / \
(2) (21) (3) (13)
/ \ / ===> / \
(1) (5) (13) (1) (5)
/
(3)
or swap with the largest keyed-child in its left subtree, then remove:
(8) (8)
/ \ / \
(2) (21) (1) (13)
/ \ / ===> \
(1) (5) (13) (5)
/ /
(3) (3)
Running-time complexity of BST operations
- search --- in the worst case the search key is not found, but would
be located next to the deepest leaf in the tree. Each step takes only a
constant amount of work so the algorithm is O(h), where h is the height
of the tree.
- insert --- in the worst case the insertion takes place at deepest
leaf in the tree --- the algorithm is also O(h).
- delete --- the worst case is either the same as for search or occurs
when the the delete key is found, but that node has two children and
either the predecessor or successor of that key is located at the
deepest leaf. In either case, the amount of work is bounded by O(h).
So, all the tree operations are proportional to height of the tree. But
what is the height in relation to n --- the total number of nodes in the
tree? It depends on the shape of the tree (which depends in what order
nodes are inserted and deleted).
We might have a completely full binary tree:
(D)
/ \
(B) (F)
/ \ / \
(A) (C) (E) (G)
Or we might have a linear binary tree (for example, we insert into a
binary search tree in sorted order):
(A)
\
(B)
\
(C)
\
(D)
\
(E)
\
(F)
\
(G)
(or any of the many cases in between.)
Balanced binary trees
A binary tree is balanced if each node has (roughly) the same number
of descendants in its left subtree as it has in its right subtree.
Important fact: For balanced binary trees, the height is proportional to
the base-two logarithm of the number of nodes in the tree: h = O(lg(n)).
Application: representing sets
- Using linear data structures to represent sets: insert, isMember,
remove, all O(n)
- Using binary search trees to represent sets: insert, isMember
(search), remove, all O(h) --- O(lg(n)) if we are lucky.
So using binary search trees to represent sets is asymptotically no worse
than lists, and often better. If we can find an efficient way to insure
that our BSTs remain balanced then we can do asymptotically better than with
lists. We will pick this theme up when we study b-trees. (And also you may
wish to investigate it further as a final project.)