An important special kind of binary tree is the binary search tree
(BST).
In a BST, each node stores some information including a unique key
value, and perhaps some associated data.
A binary tree is a BST iff, for every node n in the tree:
Here are some BSTs in which each node just stores an integer key: These are not BSTs: In the left one 5 is not greater than 6. In the right one 6 is not
greater than 7.
Note that more than one BST can be used to store the same set of
key values.
For example, both of the following are BSTs that store the same set of
integer keys:
The reason binary-search trees are important is that the following operations
can be implemented efficiently using a BST:
Question 1:
Which of the following binary trees are BSTs?
If a tree is not a BST, say why.
Question 2:
Using which kind of traversal (preorder, postorder, inorder, or level-order)
visits the nodes of a BST in sorted order?
To implement a binary search tree, we will use two classes: one for the
individual tree nodes, and one for the BST itself.
The following class definitions assume that the BST will store only
key values, no associated data.
Because most of the BST operations require comparing key values,
the type used for the key is Comparable (not Object).
To implement a BST that stores some data with each key, we would use the
following class definitions (changes are in red):
From now on, we will assume that BSTs only store key values, not
associated data.
We will also assume that null is not a valid key value (i.e., if someone
tries to insert or lookup a null value, that should cause an exception).
In general, to determine whether a given value is in the BST,
we will start at the root of the tree and determine whether
the value we are looking for:
The code for the lookup method uses an auxiliary, recursive method with
the same name (i.e., the lookup method is overloaded):
Let's illustrate what happens using the following BST:
and searching for 12:
What if we search for 15:
How much time does it take to search for a value in a BST?
Note that lookup always follows a path from the root down towards a leaf.
In the worst case, it goes all the way to a leaf.
Therefore, the worst-case time is proportional to the length of the longest
path from the root to a leaf (the height of the tree).
In general, we'd like to know how much time is required for lookup as a
function of the number of values stored in the tree.
In other words, what is the relationship between the number of nodes in a
BST and the height of the tree?
This depends on the "shape" of the tree.
In the worst case, all nodes have just one child, and the tree is essentially
a linked list.
For example:
In the best case, all nodes have 2 children, and all leaves are at the
same depth; for example:
This tree has 7 nodes, and height = 3.
In general, a tree like this (a "full" tree) will have height
approximately log2(N), where N is the number of nodes in the tree.
The value log2(N) is (roughly) the number of times you can divide
N by two, before you get to zero.
For example:
The reason we use log2. (rather than say log3) is
because every non-leaf node in a full BST has two children.
The number of nodes in each of the root's subtrees is (approximately) 1/2 of
the nodes in the whole tree, so the length of a path from the root to a leaf
will be the same as the number of times we can divide N (the total number
of nodes) by 2.
However, when we use big-O notation, we just say that the height of a full
tree with N nodes is O(log N) -- we drop the "2" subscript, because
log2(N) is proportional to logk(N) for any constant k;
i.e., for any constants B and k, and any value N:
To summarize:
The worst-case time required to do a lookup in a BST is O(height of tree).
In the worst case (a "linear" tree), this is O(N), where N is the
number of nodes in the tree.
In the best case (a "full" tree), this is O(log N).
Where should a new item go in a BST?
The answer is easy: it needs to go where you would have found it using lookup!
If you don't put it there then you won't find it later.
The code for insert is given below.
Note that:
It is easy to see that the complexity for insert is the same as for lookup:
in the worst case, a path is followed all the way to a leaf.
As mentioned above, the order in which values are inserted determines
what BST is built (inserting the same values in different orders can
result in different final BSTs).
Draw the BST that results from inserting the values 1 to 7 in each of the
following orders (reading from left to right):
Introduction
Note: if duplicate keys are allowed, then nodes with values that are
equal to the key in node n can be either in n's left subtree or in its
right subtree (but not both). In these notes, we will assume that
duplicates are not allowed.
A 10 cat 15
/ \ / / \ / \
B C 5 bat rat 5 22
/ / \ \
-3 ant 20 30
Implementing BSTs
class BinaryTreenode {
// *** fields ***
private Comparable key;
private BinaryTreenode left, right;
// *** methods ***
// constructor
public BinaryTreenode(Comparable k, BinaryTreenode l, BinaryTreenode r) {
key = k;
left = l;
right = r;
}
// access to fields
public Comparable getKey() {return key;}
public BinaryTreenode getLeft() {return left;}
public BinaryTreenode getRight() {return right;}
// change fields
public void setKey(Comparable k) {key = k;}
public void setLeft(BinaryTreenode l) {left = l;}
public void setRight(BinaryTreenode r) {right = r;}
}
class BST {
// *** fields ***
private BinaryTreenode root; // ptr to the root of the BST
// *** methods ***
public BST() { root = null; } // constructor
public void insert(Comparable key) throws DuplicateException {...}
// add key to this BST; error if it is already there
public void delete(Comparable key) {...}
// remove the node containing key from this BST if it is there;
// otherwise, do nothing
public boolean lookup(Comparable key) {...}
// if key is in this BST, return true; otherwise, return false
public void print(PrintWriter p) {...}
// print the values in this BST in sorted order (to p)
}
class BinaryTreenode {
// *** fields ***
private Comparable key;
private Object data;
private BinaryTreenode left, right;
// *** methods ***
// constructor
public BinaryTreenode(Comparable k, Object d,
BinaryTreenode l, BinaryTreenode r) {
key = k;
data = d;
left = l;
right = r;
}
...
public Object getData() {return data;}
public void setData(Object ob) { data = ob; }
...
}
class BST {
// *** fields ***
private BinaryTreenode root; // ptr to the root of the BST
// *** methods ***
public BST() { root = null; } // constructor
public void insert(Comparable key, Object data) throws DuplicateException {...}
// add key and associated data to this BST;
// error if key is already there
public void delete(Comparable key) {...}
// remove the node containing key from this BST if it is there;
// otherwise, do nothing
public Object lookup(Comparable key) {...}
// if key is in this BST, return its associated data; otherwise, return null
public void print(PrintWriter p) {...}
// print the values in this BST in sorted order (to p)
}
The lookup method
There are actually two base cases:
If neither base case holds, a recursive lookup is done on the appropriate
subtree.
Since all values less than the root's value are in the left subtree, and
all values greater than the root's value are in the right subtree, there
is no point in looking in both subtrees:
if the value we're looking for is less than the value in the root, it
can only be in the left subtree (and if it is greater than the value
in the root, it can only be in the right subtree).
public boolean lookup(Comparable k) {
return lookup(root, k);
}
private static boolean lookup(BinaryTreenode T, Comparable k) {
if (T == null) return false;
if (T.getKey().equals(k)) return true;
if (k.compareTo(T.getKey()) < 0) {
// k < this node's key; look in left subtree
return lookup(T.getLeft(), k);
}
else {
// k > this node's key; look in right subtree
return lookup(T.getRight(), k);
}
}
50
/
10
\
15
\
30
/
20
This tree has 5 nodes, and also has height = 5.
Searching for values in the range 16-19, and 21-19 will require
following the path from the root down to the leaf (the node containing the
value 20); i.e., will require time proportional to the number of nodes
in the tree.
7/2 = 3 // divide by 2 once
3/2 = 1 // divide by 2 a second time
1/2 = 0 // divide by 2 a third time, the result is zero so quit
So log2(7) is approximately equal to 3.
logB(N) = logk(N) / logk(B)
and with big-O notation we always ignore constant factors.
The insert method
public void insert(Comparable k) throws DuplicateException {
if (root == null) {
root = new BinaryTreenode(k, null, null);
}
else insert(root, k);
}
private static void insert(BinaryTreenode T, Comparable k) throws DuplicateException {
// precondition: T != null
if (T.getKey().equals(k)) throw new DuplicateException();
if (k.compareTo(T.getKey()) < 0) {
// add k as left child of T if it doesn't already have one
// else insert into T's left subtree
if (T.getLeft() == null) T.setLeft( new BinaryTreenode(k, null, null) );
else insert(T.getLeft(), k);
}
else {
// here when k > T's key
// insert k as right child of T if it doesn't already have one
// else insert into T's right subtree
if (T.getRight() == null) T.setRight( new BinaryTreenode(k, null, null) );
else insert(T.getRight(), k);
}
}
Here are pictures illustrating what happens when we insert the value 15
into the example tree used above.