Implementing BSTs
To implement a binary search tree, we need two classes: one for the individual
tree nodes, and one for the BST itself:
class treeNode {
Object data;
treeNode left, right;
}
class BST {
// fields
treeNode root: // ptr to the root of the BST
// methods
public void insert(Object ob) {...} // add ob to this BST
public void delete(Object ob) {...} // remove ob from this BST if it is there
public bool lookup(Object ob) {...} // return true iff ob is in this BST
public void print(PrintWriter p) {...} // print values in order to p
}
Let's think about the lookup method first.
{
// base cases
// (1) T is empty - return false
// (2) k is in the node pointed to be T - return true
if (NULL == T) return (false);
if (k == T->data) return (true);
// recursive cases: look in left or right subtree depending on relationship
// of k to value in node pointed to by T
if (k < T -> data) return (Lookup (T -> left, k));
else return (Lookup (T -> right, k));
}
time for Lookup
. always follows a path from root down; worst-case, goes all the way to a leaf
. time depends on "shape" of tree:
worst case: all nodes have one child
(tree really just a linked list)
time is O(n), n = # nodes in tree
best case: tree is as balanced as possible
(leaf depths differ by at most 1,
only parents of leaves have just 1 child)
time = O(log (n))
average case: considering all possible lookups in all
possible trees w/ n nodes: O(log n)
2. Insert
void Insert (Tree & T, int k)
{
// insert k into T (as a new leaf) maintaining BST properties
// Note: T itself may change, so is passed by reference
if (NULL == T){
// here's where T itself gets changed
T = new treeNode;
T -> data = k;
T -> left = T -> right = NULL;
}
else if (k < T -> data) Insert (T -> left, k);
else if (k > T -> data) Insert (T -> right, k);
}
Q: what to do if k is already in the tree?
A: nothing, or error, or insert it a second time
time for Insert
. like Lookup, in worst case, must follow path from root to leaf
so: for tree w/ n nodes
. worst-case time is "linear" tree: O(n)
. in a balanced tree, worst-case time is O(log n)
. average time is O(log n)
3. Remove
(the following code just LOCATES the node to be removed; more code coming up)
void Remove (Tree & T, int k)
{
// find the node to be removed - the (first) one that contains k
// (error if no such node)
// remove it from the tree; return storage
assert (T != NULL);
if (T -> data == k){
// this is the node to be removed
.
.
.
}
else if (k < T -> data) Remove (T -> left, k);
else Remove (T -> right, k);
}
Note: could also decide that Remove of non-existent value is just a no-op:
. remove assert
. add condition (k > T -> data) to second "else"
Remove continued: What to do once T -> data == k?
case 1: T is a leaf
. free the storage
. set T to NULL
if ((T -> left == NULL) && (T -> right == NULL)) {
delete T;
T = NULL;
}
case 2: T has just one child
"replace" T w/ its child
. don't lose the child (use a tmp ptr)
. free the storage of the removed node
. set T to point to the child
if (T -> left == NULL)) {
treeNode * tmp = T -> right;
delete T;
T = tmp;
}
else if (T -> right == NULL) {
... similar code...
}
case 3: T has two kids
. we can't just remove the node leaving a "hole" in the tree
. we can't replace it with a child, because what would we do with the
other child?
* solution: replace the value at the node w/ the value from some other node
lower down in the tree, then (recursively) remove that other node
must choose that "other" value so that we retain BST properties;
i.e., it still must be true that all values in the left subtree are
less than the "other" value, and all values in the right subtree
are greater than the "other" value
Q: what value can we use so that these properties are maintained??
A: either the largest value from the left subtree, or the smallest
value from the right subtree
. we'll arbitrarily choose the largest value from the left subtree, so:
(1) find the largest value from T's left subtree
(2) replace T -> data w/that value
(3) remove that value from T's left subtree
else { int tmp = Max(T -> Left);
T - > data = tmp;
Remove (T -> left, tmp);
}
} // end function Remove
Summary of Remove operation:
step 1: find node to be removed
step 2: case 1 node is leaf - remove it
case 2 node has one child - replace node w/ child
case 3 node has two children
. replace value in node w/ max of left subtree
. recursively remove that value from left subtree
time for Remove
. cases 1 and 2:
find node to be removed (follow path down from root); do O(1) work at
that node
time = length of path (same as Lookup, Insert)
. case 3
(a) find node to be removed (follow path down from root)
(b) get max value in left subtree (finish following path down)
(c) recursive call on Remove starting w/ root of left subtree
note: recursive call must be case 1 or case 2
(you should be able to say why!)
so its time is proportional to height of left subtree
So all of case 3 is, in the worst case, proportional to height of tree
(same as Insert and Lookup).
4. PrintInOrder
recall: if node n holds value k:
(1) all values in n's left subtee are < k
i.e., should be printed first (before printing k)
(2) all values in n's right subtree are > k
i.e., should be printed after printing k
So, to print all values in tree T in order:
(1) (recursively) print all values in left subtree
(2) print value @ root of T
(3) (recursively) print all values in right subtree
This is called an IN ORDER traversal of T
void PrintInOrder (Tree T)
{ if (T != NULL) {
PrintInOrder (T-> left); // time = size of left tree
cout << T -> data << " "; // 0(1)
PrintInOrder (T -> right); // time = size of right tree
}
}
total time = O(n) # nodes in tree, regardless of tree shape
Other traversal orders:
PreOrder
PostOrder
code similar to PrintInOrder:
Preorder: print the root
print the left subtree in preorder
print the right subtree in preorder
Postorder: print the left subtree in postorder
print the right subtree in postorder
print the root
-------------------- END BSTs --------------------
------------------------
| |
| NEW TOPIC: 2-3 TREES |
| |
------------------------
Problem: worst-case time for Lookup, Insert, Remove in BST: O(n)
(when tree is unbalanced)
Solution: BALANCED TREES height ALWAYS O(log n) n = # of nodes in tree
We will look at 1 kind of Balanced Tree: 2-3 Tree
Others are in book (not on exam).
2-3 Tree: . Every non-leaf has either 2 or 3 children
. All leaves are at the same depth
. Information (keys) in a 2-3 tree is stored ONLY at leaves
(internal nodes are for organization only)
. Info at leaves is ordered left to right
. Each internal node has child ptrs. and
(1) value of max key in LEFT subtree (leftMax)
(2) " " " " " MIDDLE subtree (middleMax)
Note: if only 2 kids, they are Left, Middle (not left, right)
Example
-------
------------
| 4 | 12 |
------------
/ | \
/ | \
------------ ------------ ------------
| 2 | 4 | | 7 | 10 | | 15 | 20 |
------------ ------------ ------------
/ | / | \ / | \
2 4 7 10 12 15 20 30
Operations on a 2-3 Tree
------------------------
1. Lookup: look up value k in tree T
Base cases:
(1) T is empty (NULL): return false
(2) T is just a leaf node: return true iff value @ node == k
Recursive cases:
. k < leftMax: look up k in left subtree
. leftMax < k < middleMax: look up k in middle subtree
. middleax < k: Lookup k in right subtree
time for Lookup:
. # calls = height of tree
. height of tree is O(log n) for n = # NODES in tree
. actual values only at leaves
but # leaves > n/2 (i.e., more than 1/2 the nodes in the tree are leaves)
so time is O(log m)
for m = # key VALUES in tree
2. Insert: insert value v into tree T, maintaining 2-3 tree properties
Step 1: Find the node n that will be the parent of the new node
i.e. do not search all the way down to a leaf; stop @ a parent of
(2 or 3) leaves
note: This requires special-case code for empty trees and for trees
w/ a single node
so form of Insert will be:
if tree is empty ...
else if tree is just 1 node ...
else call Insert1 (T, v)
where Insert1 is the recursive fn that handles all but the 2 special
cases
To find n, parent of new node:
. base case: T's kids are all leaves - found! (n is T)
. recursive cases:
v < LeftMax: insert v into left child
v < middleMax or only 2 kids: insert v into middle child
v > middeMax and 3 kids: insert v into right child
Once n is found:
Case 1: n has only 2 children
Insert v as appropriate child of n:
(1) v < LeftMax(n)
make v n's leftchild (move others over)
fix values LeftMax(n) and MiddleMax(n)
no possibility of change to an ancestor's LeftMax or MiddleMax
(because new value not max child)
(2) v between LeftMax(n) and Middlemax(n)
make v n's middle child
fix Middlemax(n)
(3) r > MiddleMax(n)
make v n's Right child
fix MiddleMax fields of n's ancestors as needed
Case 2: n already has 3 kids
(1) make v the appropriate new child of n, anyway now n has 4 kids
(2) create new internal node m - give m n's two rightmost kids
(fix n's, m's leftMax, middleMax)
(3) add m as appropriate new child of n's parent
if n's parent had only 2 kids - quit
else keep creating new nodes recursively up the tree
if the root is given 4 kids
create new node m as above
create new root w/ kids n and m
(4) fix leftMax and middleMax of ancestors as needed
time for Insert:
step 1: (find node n) involves following a path from root to parent of
leaves: O(height of true) = O(log n)
step 2: worst case involves adding new nodes all the way back up from leaf
to root, also O(log n)
So total time is O(log n).
3. Remove: remove value k from tree T
step 1: Find n, parent of node to be removed
(special case first for T just one node containing k - delete it,
make T NULL)
step 2:
case 1: n has 3 kids
remove kid w/ value k
fix leftMax, middleMax at n and n's ancestors
case 2: n has only 2 kids
2a: n is the root of the tree
remove node w/ k and root leaving other kid as entire tree
2b: n has a left or right sibling w/ 3 kids
. remove node w/ k
. "steal" one of sibling's kids
. fix leftMax, middleMax of n, sibling, ancestors
2c: sibling(s) have only 2 kids
. remove node w/ k
. make remaining kid a child of n's sibling
. fix leftMax, middlemax
time for Remove: (similar to Insert)
worst case involves 1 traversal down to find n + another "traversal" up
removing nodes along the way (traversal up is really actions that happen
after the recursive call has finished)
So total time is 2 * height = O(log n)
DISCUSSION: How to define a 2-3 tree node?
Leaf and non-leaf nodes store different things:
leaf: key value
non-leaf: leftMax, middleMax, 3 child ptrs
Also, we need to be able to tell when a node is a leaf.
. easiest: use struct w/ all fields:
struct TreeNode {
bool isLeaf;
int key;
int leftMax, middleMax;
TreeNode *left, *middle, *right;
};
. could save some space by
using one field for both key and leftMax
using left child == NULL to test for "isLeaf"
in this case, probably want to define functions as follows:
(good idea anyway so that actual representation can change!)
bool IsLeaf (TreeNode *T)
{ return (T -> leftChild == NULL);}
int Key (TreeNode *T)
{ assert (IsLeaf (T));
return (T -> leftMax);
}
int LeftMax (TreeNode *T)
{ assert (! IsLeaf (T)));
return (T -> leftMax);
}
etc.
2-3 TREE SUMMARY
================
o info is stored only at leaves, ordered left-to-right
o non-leaf nodes have 2 or 3 kids (not 1)
o non-leaf nodes also have leftMax, middleMax values (as well as
pointers to children)
o all leaves are at same depth
o height of tree is O(log n) n = # nodes in tree
o at least half the nodes are leaves, so height of tree is
also O(log n) for n = # values stored in tree
SUMMARY: TREE DICTIONARIES
===========================
BST 2-3 Tree
--- --------
where are every node leaves only
values stored
extra info @ 2 child LeftMax, MiddleMax,
nodes ptrs. 3 child ptrs.
worst-case time O(n) O(log n)
for Lookup, Insert,
Remove (n = #
values stored in tree)
Representing Binary Trees Using Arrays
======================================
Method 1:
use 3 arrays to hold: values, left child "ptrs", right child "ptrs"
(a pointer is really the INDEX in which information about the
child is stored in the array)
Example
-------
H value left right
/ \ --------------------
B K [0] | H | 1 | 2 |
\ --------------------
D [1] | B | -1 | 3 | -1 means no child
/ \ --------------------
C F [2] | K | 1 | 2 |
--------------------
[3] | D | 5 | 4 |
--------------------
[4] | F | -1 | -1 |
--------------------
[5] | C | -1 | -1 |
--------------------
. if nodes can be REMOVED, must maintain free list (linking via "right child"
array)
Example
-------
before removing anything; firstFree is the index of the first free space
in the array:
H value left right
/ \ --------------------
B K [0] | H | 1 | 2 |
\ --------------------
D [1] | B | -1 | 3 |
/ \ --------------------
C F [2] | K | 1 | 2 |
--------------------
[3] | D | 5 | 4 |
--------------------
[4] | F | -1 | -1 |
--------------------
[5] | C | -1 | -1 |
--------------------
[6] | ? | ? | 7 | <-- next free space is
-------------------- array[7]
[7] | ? | ? | 8 | <-- next free space is
-------------------- array[8]
firstFree: 6 [8] | ? | ? | -1 | <-- no more free spaces
--------------------
after removing F:
H value left right
/ \ --------------------
B K [0] | H | 1 | 2 |
\ --------------------
D [1] | B | -1 | 3 |
/ --------------------
C [2] | K | 1 | 2 |
--------------------
[3] | D | 5 | 4 |
--------------------
[4] | ? | ? | 6 | <-- next free space is
-------------------- array[6]
[5] | C | -1 | -1 |
--------------------
[6] | ? | ? | 7 | <-- next free space is
-------------------- array[7]
[7] | ? | ? | 8 | <-- next free space
-------------------- is array[8]
firstFree: 4 [8] | ? | ? | -1 | <-- no more free spaces
--------------------
. "Note: when a node is "removed", that space is added to front of free list
Method 2: single array of values if there is a special "empty" value,
else 2 arrays: values & booleans
. root's value is stored in A[1]
. if node's value is in A[n]
left child is in A[n*2]
right child is in A[n*2+1]
. if a node has NO left child, A[n*2] contains the special "empty" value
(similarly for no right child)
if there is no special "empty" value, then the 2nd array contains "false"
for every "empty" position in the 1st array
Example (use "" as the special "empty" value)
-------
H value
/ \ -------
B K [1] | H |
\ -------
D [2] | B |
/ \ -------
C F [3] | K |
-------
[4] | |
-------
[5] | D |
-------
[6] | |
-------
[7] | |
-------
[8] | |
-------
[9] | |
-------
[10] | C |
-------
[11] | F |
-------