CS 540 - Introduction To Artificial Intelligence Spring(2000)

Homework #1 - Decision Tree Learning

Part I - Information Gain

Consider the criteria for accepting candidates to the PhD program of the mythical University of St. Nordaf. Each candidate is evaluated according to four attributes: the grade point average (GPA), the quality of the undergraduate university attended, the publication record, and the strength of the recommendation letters. To simplify our example, let us discretize and limit the possible values of each attribute: Possible GPA scores are 4.0, 3.7, and 3.5; universities are categorized as top_10, top_20, and top_30 (by top_20 we mean places 11-20 and by top_30 we mean places 21-30.); publication record is a binary attribute - either the applicant has published previously or not; and recommendation letters are similarly binary, they are either good or normal. Finally, the candidates are classified into two classes: accepted, or P (for 'positive'), and rejected, or N (for 'negative'). Here is an example of one possible decision tree determining acceptance.

Pat applicant doesn't now this decision tree, but does have the data in this table regarding twelve of last year's applicants:

Does the tree given above correctly categorize the given examples?
Pat uses the decision tree algorithm shown in class (with the information gain computations for selecting split variables) to induce the decision tree employed by St. Nordaf's officials. What tree will the algorithm come up with? Show all the computations involved and draw the resulting tree.
Given the following example: {GPA = 4.0; university = top_10; published = yes; recommendation = normal}, would both trees classify the example the same way? In either case (Yes or No), explain why.

Part II - Implementing A Decision Tree

In this part you will be constructing a java implementation of the ID3 algorithm. Here, you will be creating a tree that is capable of determining what number is represented in a LED display consisting of seven LEDs. A LED (light emitting diode) is a type of diode that emits light when current passes through it. LEDs have many uses, visible LEDs are used as indicator lights on all sorts of electronic devices and in moving-message panels. More information is available via the UC-Irvine archive of machine learning datasets (you should take a look at this site). The seven LEDs in our panel could be on or off and are arranged to form the numbers from 0 to 9 as shown here:

The number three, for example, would be formed with:

{led_1 = on; led_2 = off; led_3 = on; led_4 = on; led_5 = off; led_6 = on; led_7 = on}

Your attributes will be the seven LED's, their values will be on and off, and the classifications will be the numbers 0 through 9. Thus, you will have seven binary attributes and ten classes. There are two supplied data files: the led_Attributes data file contains the attribute names, the number of values for each attribute ( 2 since they are boolean ) and the possible values for each attribute ( 1 representing on and 0 representing off ). The led_Examples data file contains 1500 examples of LED configurations. The format for each example is the values (1 or 0) for led_1 through led_7 separated by commas, and the number that it represents at the end. The above example (the number three) would appear as the following entry in the file: 1,0,1,1,0,1,1,3. The examples file contains some noisy data, so don't expect every example to be faithful to the rule.

Running ID3
Your program will read the above files separating the examples into testing, training, and tuning sets; then carry out a 5-fold cross validation running the ID3 algorithm and thus inducing a decision tree at each fold. At the end of each fold, the program should report the tree size ( number of interior nodes, number of leaf nodes, and total nodes ), then run the testing set examples through the induced tree and report the tree's accuracy on that set of examples. After finishing the 5-fold cross validation, the program should report the average tree size (just the total nodes this time) and the average accuracy over all 5 folds. This first part comprises the non-pruning part of your program.

Pruning ID3
When all the above is done, the program should repeat the 5-fold cross validation, but this time training only on the examples in each fold's training set that were not set aside for pruning(don't train on these tuning examples!). This time, at the end of each fold, the program should prune the induced tree using the tuning set to carry out the pruning. For this homework, we will separate 1/3 of each fold's training examples as the fold's tuning set and the remaining 2/3 will be used for training. All you have to do is specify this in the command line and the provided code will carry out this separation for you (see the Provided Code Section). A sample pseudocode for the pruning algorithm is provided below:

Let bestTree = the tree produced by ID3 on the TRAINING set
Let bestAccuracy = the accuracy of bestTree on the TUNING set
Let progressMade = true

while (progressMade) // Continue as long as improvement on TUNING SET
{

Set progressMade = false
Let currentTree = bestTree

For each interiorNode N (including the root) in currentTree
{

// Consider various pruned versions of the current tree

// and see if any are better than the best tree found so far

        Let prunedTree be a copy of currentTree,
        except replace N by a leaf node
        whose label equals the majority class among TRAINING set
        examples that reached node N (break ties in favor of '-')

Let newAccuracy = accuracy of prunedTree on the TUNING set

        // Is this pruned tree an improvement, based on the TUNE set?
        // When a tie, go with the smaller tree (Occam's Razor).
        If (newAccuracy >= bestAccuracy)
        {

          bestAccuracy = newAccuracy
          bestTree = prunedTree
          progressMade = true

}

}

}
return bestTree

Finally, report the final pruned tree's size (same way as above) and run the fold's testing set examples through the pruned tree, reporting the tree's accuracy. At the end of this pruning 5-fold cross validation, report the average accuracy and tree size over all five folds.

On each of the above experiments, you should pick one of the folds, say the last one, and print out the corresponding induced tree. We have provided code for you to do this (see Provided Code section), but you can create your own if you desire. The printed trees usually do not fit very well in an 8 1/2 x 11 piece of paper, but it helps a lot to print them in landscape mode. The important thing here is to physically see what the tree looks like in addition its size and accuracy, so we do not expect to see extremely neat trees.

Program Output
Your program's output should be as follows:

ID3:

Fold # 0: