Recall that, for binary-search trees, although the average-case times
for the lookup, insert, and delete methods are all O(log N), where N is the
number of nodes in the tree, the worst-case time is O(N).
We can guarantee O(log N) time for all three methods by using
a balanced tree -- a tree that always has height O(log N)-- instead
of a binary-search tree.
A number of different balanced trees have been defined, including
AVL trees, red-black trees, and B trees.
You might learn about the first two in an algorithms class, and the
third in a database class.
Here we will look at yet another kind of balanced tree called
a 2-3 Tree.
The important idea behind all of these trees is that the insert and
delete operations may restructure the tree to keep it balanced.
So lookup, insert, and delete will always be logarithmic in the number
of nodes, but insert and delete may be more complicated than for
binary-search trees.
The important facts about a 2-3 tree are:
As for binary search trees, the same values can usually be represented by more
than one tree.
Here are three different 2-3 trees that all store the values
2,4,7,10,12,15,20,30:
Draw two different 2-3 trees, both containing the letters
A through G as key values.
The auxiliary insert method performs the following steps to find node n, the
parent of the new node:
Case 1: n has only 2 children
Case 2: n already has 3 children
What is the time for insert?
Finding node n (the parent of the new node) involves following a path from
the root to a parent of leaves.
That path is O(height of tree) = O(log N), where N is the number of nodes
in the tree (recall that it is also log M, where M is the number of key
values stored in the tree).
Once node n is found, finishing the insert, in the worst case,
involves adding new nodes and/or fixing fields all the way back up
from the leaf to the root, which is also O(log N).
So the total time is O(log N), which is also O(log M).
Question 1:
Draw the 2-3 tree that results from inserting the value "C" into the following
2-3 tree:
Question 2:
Now draw the tree that results from adding the value "F" to the tree you drew
for question 1.
Once node n (the parent of the node to be deleted) is found, there are
two cases, depending on how many children n has:
case 1: n has 3 children
case 2: n has only 2 children
The time for delete is similar to insert;
the worst case involves one traversal down the tree to find n, and another
"traversal" up the tree, fixing leftMax and middleMax fields along the way
(the traversal up is really actions that happen after the recursive call
to delete has finished).
So the total time is 2 * height-of-tree = O(log N).
Question 1:
Draw the 2-3 tree that results from deleting the value "X" from the following
2-3 tree:
Question 2:
Now draw the tree that results from deleting the value "H" from the tree
you drew for question 1.
Introduction
------------
| 4 | 12 |
------------
/ | \
/ | \
------------ ------------ -------------
| 2 | 4 | | 7 | 10 | | 15 | 20 |
------------ ------------ -------------
/ | / | \ / | \
2 4 7 10 12 15 20 30
------------
| 7 | 15 |
------------
/ | \
/ | \
------------ -------------- ---------------
| 2 | 4 | | 10 | 12 | | 20 | 30 |
------------ -------------- ---------------
/ | \ / | \ | |
2 4 7 10 12 15 20 30
---------------
| 10 | 30 |
---------------
/ |
----------------- \
/ \
------------ ---------------
| 4 | 10 | | 15 | 30 |
------------ ---------------
/ | / |
/ | / |
------------ ------------- --------------- ---------------
| 2 | 4 | | 7 | 10 | | 12 | 15 | | 20 | 30 |
------------ ------------- --------------- ---------------
/ | / | / | / |
2 4 7 10 12 15 20 30
Operations on a 2-3 Tree
The lookup operation
Recall that the lookup operation needs to determine whether key value k is
in a 2-3 tree T.
The lookup operation for a 2-3 tree is very similar to the lookup operation
for a binary-search tree.
There are 2 base cases:
And there are 3 recursive cases:
It should be clear that the time for lookup is proportional to the height
of the tree.
The height of the tree is O(log N) for N = the number of nodes in the
tree.
You may think this is a problem, since the actual values are only at the
leaves.
However, the number of leaves is always greater than N/2
(i.e., more than half the nodes in the tree are leaves).
So the time for lookup is also O(log M), where M is the number of key values
stored in the tree.
The insert operation
The goal of the insert operation is to insert key k into tree T,
maintaining T's 2-3 tree properties.
Special cases are required for empty trees and for trees with just
a single (leaf) node.
So the form of insert will be:
if T is empty replace it with a single node containing k
else if T is just 1 node m:
(a) create a new leaf node n containing k
(b) create a new internal node with m and n as its children,
and with the appropriate values for leftMax and middleMax
else call auxiliary method insert(T, k)
The auxiliary insert method is the recursive method that handles all
but the 2 special cases;
as for binary-search trees, the first task of the auxiliary method
is to find the (non-leaf)
node that will be the parent of the newly inserted node.
Once n is found, there are two cases, depending on whether n has room for
a new child:
------------
| B | H |
------------
/ | \
/ | \
------------ ------------ ------------
| A | B | | D | E | | K | X |
------------ ------------ ------------
/ | / | \ / |
A B D E H K X
The delete operation
Deleting key k is similar to inserting: there is a special case when
T is just a single (leaf) node containing k (T is made empty);
otherwise, the parent of the node
to be deleted is found, then the tree is fixed up if necessary so that it
is still a 2-3 tree.
------------
| B | H |
------------
/ | \
/ | \
------------ ------------ ------------
| A | B | | D | E | | K | X |
------------ ------------ ------------
/ | / | \ / |
A B D E H K X
BST | 2-3 Tree | |
where are values stored | every node | leaves only |
extra info in non-leaf nodes | 2 child ptrs | leftMax, middleMax, 3 child ptrs |
worst-case time for lookup, insert, and delete (N = # values stored in tree) | O(N) | O(log N) |
average-case time for lookup, insert, and delete (N = # values stored in tree) | O(log N) | O(log N) |