2-3 Trees


Contents

Introduction

Recall that, for binary-search trees, although the average-case times for the lookup, insert, and delete methods are all O(log N), where N is the number of nodes in the tree, the worst-case time is O(N). We can guarantee O(log N) time for all three methods by using a balanced tree -- a tree that always has height O(log N)-- instead of a binary-search tree.

A number of different balanced trees have been defined, including AVL trees, red-black trees, and B trees. You might learn about the first two in an algorithms class, and the third in a database class. Here we will look at yet another kind of balanced tree called a 2-3 Tree.

The important idea behind all of these trees is that the insert and delete operations may restructure the tree to keep it balanced. So lookup, insert, and delete will always be logarithmic in the number of nodes, but insert and delete may be more complicated than for binary-search trees.

The important facts about a 2-3 tree are:

As for binary search trees, the same values can usually be represented by more than one tree. Here are three different 2-3 trees that all store the values 2,4,7,10,12,15,20,30:


                      ------------
                      | 4  |  12 |
                      ------------
                     /     |       \
                    /      |        \
      ------------    ------------    -------------
      | 2  |     |    | 7  | 10  |    | 15  | 20  |
      ------------    ------------    -------------
      /    |          /    |     \    /      |     \
     2     4         7    10     12  15      20     30



                      ------------
                      | 7  |  15  |
                      ------------
                    /      |        \
                   /       |         \
      ------------   --------------   ---------------
      | 2  |  4  |   |  10  | 12  |   |  20  |      |
      ------------   --------------   ---------------
      /    |     \    /     |    \      |    |
     2     4      7  10     12   15    20    30


                          ---------------
                          |  10  |      |
                          ---------------
                       /          \
                      /            +---------+
                     /                        \
               ------------                     ---------------
               | 4  |     |                     |  15  |      |
               ------------                     ---------------
              /      |                          /      |
             /       |                         /       |
  ------------   -------------   ---------------  ---------------
  | 2  |     |   |  7  |     |   |  12  |      |  |  20  |      |
  ------------   -------------   ---------------  ---------------
  /    |         /    |           /     |          /     |
 2     4        7     10         12    15         20    30


Test Yourself #1

Draw two different 2-3 trees, both containing the letters A through G as key values.

solution


Operations on a 2-3 Tree

The lookup operation

Recall that the lookup operation needs to determine whether key value k is in a 2-3 tree T. The lookup operation for a 2-3 tree is very similar to the lookup operation for a binary-search tree. There are 2 base cases:

  1. T is empty: return false
  2. T is a leaf node: return true iff the key value in T is k

And there are 3 recursive cases:

  1. k <= T.leftMax: look up k in T's left subtree
  2. T's right subtree is empty OR k <= T.middleMax: look up k in T's middle subtree
  3. look up k in T's right subtree

It should be clear that the time for lookup is proportional to the height of the tree, because in the worst case we follow one path from the root to a leaf. The height of the tree is O(log N) for N = the number of nodes in the tree. You may think this is a problem, since the actual values are only at the leaves. However, the number of leaves is always greater than N/2 (i.e., more than half the nodes in the tree are leaves). So the time for lookup is also O(log M), where M is the number of key values stored in the tree.

The insert operation

The goal of the insert operation is to insert key k into tree T, maintaining T's 2-3 tree properties. We will assume that insertions of values not already in the tree are not allowed, but we won't include code to check for that case. Special cases are required for insertion into an empty tree and a tree with just a single (leaf) node. So the form of insert will be:

           if T is empty replace it with a single leaf node containing k
           else if T is just a leaf node m:
                (a) create a new leaf node n containing k
                (b) create a new root node that is an internal node
		    with m and n as its children, and with the appropriate
		    value for leftMax 
           else call auxiliary method insert(T, k)

The auxiliary insert method is a recursive method that handles all but the 2 special cases. It may have to restructure the tree to keep it balanced. In that case, it returns the root of a new tree to be inserted as the left child of the node T that was its first parameter, and it also returns the largest value in the new tree. Otherwise, it returns null.

As for binary-search trees, the first task of the auxiliary method is to find the (internal) node that will be the parent of the newly inserted leaf node.

The auxiliary insert method performs the following steps to find node n, the parent of the new node:

Once n is found, there are two cases, depending on whether n has room for a new child:

Case 1: n has only 2 children

Case 2: n already has 3 children

When a call to insert finishes, the caller must check to see whether a non-null value was returned. If the pair (m, max) was returned, we must try to add the tree rooted at m as the appropriate child of the current node T (where appropriate means just to the left of the child that was passed to insert).

There are two cases, just as there were when we tried to add a new leaf as a child of n: if T only has two children, we can add m as the third child, fix the values of leftMax and middleMax, and return null. Otherwise, we must again create a new subtree to pass up to be inserted as a child of T's parent.

If the original call insert(root, key) returns the pair (m, max) (because the tree restructuring has propagated all the way up, and the root already has 3 children), then create a new root node r with two children: m and the original root.

What is the time for insert? Finding node n (the parent of the new node) involves following a path from the root to a parent of leaves. That path is O(height of tree) = O(log N), where N is the number of nodes in the tree (recall that it is also log M, where M is the number of key values stored in the tree).

Once node n is found, finishing the insert, in the worst case, involves adding new nodes and/or fixing fields all the way back up from the leaf to the root, which is also O(log N).

So the total time is O(log N), which is also O(log M).


Test Yourself #2

Question 1: Draw the 2-3 tree that results from inserting the value "C" into the following 2-3 tree:


                      ------------
                      | B  |  H  |
                      ------------
                     /     |       \
                    /      |        \
      ------------    ------------     ------------
      | A  |     |    | D  | E   |     | K  |     |
      ------------    ------------     ------------
      /    |          /    |     \    /      |   
     A     B         D     E      H  K       X   

Question 2: Now draw the tree that results from adding the value "F" to the tree you drew for question 1.

solution


The delete operation

Deleting key k is similar to inserting: there is a special case when T is just a single (leaf) node containing k (T is made empty); otherwise, the parent of the node to be deleted is found, then the tree is fixed up if necessary so that it is still a 2-3 tree.

Once node n (the parent of the node to be deleted) is found, there are two cases, depending on how many children n has:

case 1: n has 3 children

case 2: n has only 2 children

The time for delete is similar to insert; the worst case involves one traversal down the tree to find n, and another "traversal" up the tree, fixing leftMax and middleMax fields along the way (the traversal up is really actions that happen after the recursive call to delete has finished).

So the total time is 2 * height-of-tree = O(log N).


Test Yourself #3

Question 1: Draw the 2-3 tree that results from deleting the value "X" from the following 2-3 tree:


                      ------------
                      | B  |  H  |
                      ------------
                     /     |       \
                    /      |        \
      ------------    ------------     ------------
      | A  |     |    | D  | E   |     | K  |     |
      ------------    ------------     ------------
      /    |          /    |     \    /      |   
     A     B         D     E      H  K       X   

Question 2: Now draw the tree that results from deleting the value "H" from the tree you drew for question 1.

solution


2-3 Tree Summary

In a 2-3 tree:

Summary of Binary-Search Trees vs 2-3 Trees

BST 2-3 Tree
where are values stored every node leaves only
extra info in non-leaf nodes 2 child ptrs leftMax, middleMax, 3 child ptrs
worst-case time for lookup, insert, and delete (N = # values stored in tree) O(N) O(log N)
average-case time for lookup, insert, and delete (N = # values stored in tree) O(log N) O(log N)