University of Wisconsin - MadisonCS 540 Lecture NotesC. R. Dyer

Reasoning under Uncertainty (Chapters 13 and 14.1 - 14.4)


Why Reason Probabilistically?

Representing Belief about Propositions

Axioms of Probability Theory

Probability Theory provides us with the formal mechanisms and rules for manipulating propositions represented probabilistically. The following are the three axioms of probability theory: From these axioms we can show the following properties also hold:

Joint Probability Distribution

Given an application domain in which we have determined a sufficient set of random variables to encode all of the relevant information about that domain, we can completely specify all of the possible probabilistic information by constructing the full joint probability distribution, P(V1=v1, V2=v2, ..., Vn=vn), which assigns probabilities to all possible combinations of values to all random variables.

For example, consider a domain described by three Boolean random variables, Bird, Flier, and Young. Then we can enumerate a table showing all possible interpretations and associated probabilities:

BirdFlierYoungProbability
TTT0.0
TTF0.2
TFT0.04
TFF0.01
FTT0.01
FTF0.01
FFT0.23
FFF0.5

Notice that there are 8 rows in the above table representing the fact that there are 23 ways to assign values to the three Boolean variables. More generally, with n Boolean variables the table will be of size 2n. And if n variables each had k possible values, then the table would be size kn.

Also notice that the sum of the probabilities in the right column must equal 1 since we know that the set of all possible values for each variable are known. This means that for n Boolean random variables, the table has 2n-1 values that must be determined to completely fill in the table.

If all of the probabilities are known for a full joint probability distribution table, then we can compute any probabilistic statement about the domain. For example, using the table above, we can compute

Conditional Probabilities

Combining Multiple Evidence using the Joint Probability Distribution

As we accumulate evidence or symptoms or features that describe the state of the world, we'd like to be able to easily update our degree of belief in some query or conclusion or diagnosis. One way to do this is again use the information given in a full joint probability distribution table. For example,

P(~Bird | Flier, ~Young) = P(~B,F,~Y) / (P(~B,F,~Y) + P(B,F,~Y))
                         = .01 / (.01 + .2)
			 = .048

In general, P(V1=v1, ..., Vk=vk | Vk+1=vk+1, ..., Vn=vn) = sum of all entries where V1=v1, ..., Vn=vn divided by the sum of all entries where Vk+1=vk+1, ..., Vn=vn.

While this method will work for any conditional probability involving arbitrary known evidence, it is again intractable because it requires an exponentially large table in the form of the full joint probability distribution.

Using Bayes's Rule

Bayesian Networks (aka Belief Networks)

Net Topology Reflects Conditional Independence Assumptions

Building a Bayesian Net

Intuitively, "to construct a Bayesian Net for a given set of variables, we draw arcs from cause variables to immediate effects. In almost all cases, doing so results in a Bayesian network [whose conditional independence implications are accurate]." (Heckerman, 1996)

More formally, the following algorithm constructs a Bayesian Net:

  1. Identify a set of random variables that describe the given problem domain
  2. Choose an ordering for them: X1, ..., Xn
  3. for i=1 to n do
    1. Add a new node for Xi to the net
    2. Set Parents(Xi) to be the minimal set of already added nodes such that we have conditional independence of Xi and all other members of {X1, ..., Xi-1} given Parents(Xi)
    3. Add a directed arc from each node in Parents(Xi) to Xi
    4. If Xi has at least one parent, then define a conditional probability table at Xi: P(Xi=x | possible assignments to Parents(Xi)). Otherwise, define a prior probability at Xi: P(Xi)

Notes about this algorithm:

Computing Joint Probabilities from a Bayesian Net

To illustrate how a Bayesian Net can be used to compute an arbitrary value in the joint probability distribution, consider the Bayesian Net shown above for the "home domain."

Goal: Compute P(B,~O,D,~L,H)

P(B,~O,D,~L,H) = P(H,~L,D,~O,B)
     = P(H | ~L,D,~O,B) * P(~L,D,~O,B)            by Product Rule
     = P(H|D) * P(~L,D,~O,B)                      by Conditional Independence of H and
                                                       L,O, and B given D
     = P(H|D) P(~L | D,~O,B) P(D,~O,B)            by Product Rule
     = P(H|D) P(~L|~O) P(D,~O,B)                  by Conditional Independence of L and D,
                                                       and L and B, given O
     = P(H|D) P(~L|~O) P(D | ~O,B) P(~O,B)        by Product Rule
     = P(H|D) P(~L|~O) P(D|~O,B) P(~O | B) P(B)   by Product Rule
     = P(H|D) P(~L|~O) P(D|~O,B) P(~O) P(B)       by Independence of O and B
     = (.3)(1 - .6)(.1)(1 - .6)(.3)
     = 0.00144

where all of the numeric values are available directly in the Bayesian Net (since P(~A|B) = 1 - P(A|B)).

Computing Conditional Probabilities from a Bayesian Net

Causal (Top-Down) Inference

The algorithm for computing a conditional probability from a Bayesian Net is complicated, but it is easy when the query involves nodes that are directly connected to each other. In this section we consider problems of the form P(Q|E) and there is a link in the Bayesian Net from evidence E to query Q. We call this case causal inference because we are reasoning in the same direction as the causal arc.

Consider our "home domain" and the problem of computing P(D|B), i.e., what is the probability that my dog is outside when it has bowel troubles? We can solve this problem as follows:

  1. Apply the Product Rule and Marginalization
    P(D|B) = P(D,B)/P(B)                   by the Product Rule
           = (P(D,B,O) + P(D,B,~O))/P(B)   by marginalizing P(D,B)
           = P(D,B,O)/P(B) + P(D,B,~O)/P(B)
           = P(D,O|B) + P(D,~O|B)
    

  2. Apply the conditionalized version of the chain rule, i.e., P(A,B|C) = P(A|B,C)P(B|C), to obtain
    P(D|B) = P(D|O,B)P(O|B) + P(D|~O,B)P(~O|B)
    

  3. Since O and B are independent by the network, we know P(O|B)=P(O) and P(~O|B)=P(~O). This means we now have

    P(D|B) = P(D|O,B)P(O) + P(D|~O,B)P(~O)
           = (.05)(.6) + (.1)(1 - .6)
           = 0.07
    

In general, for this case we first rewrite the goal conditional probability of query variable Q in terms of Q and all of its parents (that are not evidence) given the evidence. Second, re-express each joint probability back to the probability of Q given all of its parents. Third, look up in the Bayesian Net the required values.

Diagnostic (Bottom-Up) Inference

The last section considered simple causal inference. In this section we consider the simplest case of diagnostic inference. That is, the problem is to compute P(Q|E) and in the Bayesian Net there is an arc from query Q to evidence E. So, we are using a symptom to infer a cause. This is analogous to using the abduction rule of inference in FOL.

For example, consider the "home domain" again and the problem of computing P(~B|~D). That is, if the dog is not outside, what is the probability that the dog has bowel troubles?

  1. First, use Bayes's Rule:
    P(~B|~D) = P(~D|~B)P(~B)/P(~D)
    
  2. We can look up in the Bayesian Net the value of P(~B) = 1 - .3 = .7. Next, compute P(~D|~B) using the causal inference method described above. Here we get
    P(~L|~B) = P(~D,O|~B) + P(~D,~O|~B)
             = P(~D|O,~B)P(O|~B) + P(~D|~O,~B)P(~O|~B)
             = P(~D|O,~B)P(O) + P(~D|~O,~B)P(~O)
             = (.9)(.6) + (.8)(.4)
             = 0.86
    
    So, P(~B|~D) = (.86)(.7)/P(~D) = .602/P(~D).

  3. To avoid computing the prior probability, P(~D), of symptom D, we can use normalization, which requires computing P(B|~D). That is, P(B|~D) = P(~D|B)P(B)/P(~D) by Bayes's Rule, and P(B)=.3 from the Bayesian Net. Now compute P(~D|B) as follows:
    P(~D|B) = P(~D,O|B) + P(~D,~O|B)
            = P(~D|O,B)P(O|B) + P(~D|~O,B)P(~O|B)
            = P(~D|O,B)P(O) + P(~D|~O,B)P(~O)
            = (.95)(.6) + (.9)(.4)
            = 0.93
    
    So, P(B|~D) = (.93)(.3)/P(~D) = .279/P(~D). Since P(~B|~D) + P(B|~D) = 1, we have .602/P(~D) + .279/P(~D) = 1, and so P(~D) = .881. Thus, P(~B|~D) = .602/.881 = .683.

In general, diagnostic inference problems are solved by converting them to causal inference problems using Bayes's Rule, and then proceeding as before.

Summary


Copyright © 1996-2003 by Charles R. Dyer. All rights reserved.