University of Wisconsin - MadisonCS 540 Lecture NotesC. R. Dyer

Machine Learning (Chapter 18.1 - 18.3)


What is Learning?

Why do Machine Learning?

Components of a Learning System

Critic <---------------- Sensors
 |                          |
 |                          |
 |                          |
 v                          v
Learning Element <-----> Performance Element -----> Effectors
 |                          ^
 |                          |
 |              /-----------|
 v             /
Problem Generator

We will concentrate on the Learning Element

Evaluating Performance

Several possible criteria for evaluating a learning algorithm:

Most common criterion is predictive accuracy

Major Paradigms of Machine Learning

The Inductive Learning Problem

Inductive Bias

Inductive Learning Framework

Inductive Learning by Nearest-Neighbor Classification

One simple approach to inductive learning is to save each training example as a point in Feature Space, and then classify a new example by giving it the same classification (+ or -) as its nearest neighbor in Feature Space.

The problem with this approach is that it doesn't necessarily generalize well if the examples are not "clustered."

Inductive Concept Learning by Learning Decision Trees

Algorithm

function decision-tree-learning(examples, attributes, default)
  ;; examples is a list of training examples
  ;; attributes is a list of candidate attributes for the
  ;;    current node
  ;; default is the default value for a leaf node if there
  ;;    are no examples left
  if empty(examples) then return(default)
  if same-classification(examples) then return(class(examples))
  if empty(attributes) then return(majority-classification(examples))
  best = choose-attribute(attributes, examples)
  tree = new node with attribute best
  foreach value v of attribute best do
    v-examples = subset of examples with attribute best = v
    subtree = decision-tree-learning(v-examples, attributes - best, 
        	   majority-classification(examples))
    add a branch from tree to subtree with arc labeled v
  return(tree)

  • How to Choose the Best Attribute for a Node?

    Some possibilities:

    The C5.0 algorithm uses the Max-Gain method of selecting the best attribute.

    Information Gain Method for Selecting the Best Attribute

    Use information theory to estimate the size of the subtrees rooted at each child, for each possible attribute. That is, try each attribute, evaluate and pick the best one.

    Example

    Consider the following six training examples, where each example has three attributes: color, shape and size. Color has three possible values: red, green and blue. Shape has two possible values: square and round. Size has two possible values: big and small.

    ExampleColorShapeSizeClass
    1redsquarebig+
    2bluesquarebig+
    3redroundsmall-
    4greensquaresmall-
    5redroundbig+
    6greensquarebig-

    Case Studies

    Many case studies have shown that decision trees are at least as accurate as human experts. For example, one study for diagnosing breast cancer had humans correctly classifying the examples 65% of the time, and the decision tree classified 72% correct.

    British Petroleum designed a decision tree for gas-oil separation for offshore oil platforms. Replaced a rule-based expert system.

    Cessna designed an airplane flight controller using 90,000 examples and 20 attributes per example.

    Extensions of the Decision Tree Learning Algorithm

    Summary


    Copyright © 2001-2003 by Charles R. Dyer. All rights reserved.