Markov Chain Monte Carlo

In class we discussed Markov Chain Monte Carlo (MCMC) in three different contexts: randomly selecting a two-way contingency table from the uniform distribution of all contingency tables with given fixed row and column sums, estimating a mean from a normal distribution when assuming a Bayesian uniform prior distribution on the mean, and in the problem of reconstructing evolutionary trees from aligned DNA sequence data. These note summarize the essentials of these discussions.

Generally speaking, MCMC provides a mechanism for taking dependent samples in situations where regular sampling is difficult, if not completely intractable. The standard situation is where it is difficult to normalize a posterior probability distribution. For the contingency table example, when the row and column sums are large, it is very difficult to count the number of tables with the same row and column sums, much less to order them in a formal manner which would allow random sampling. For the evolution problem, normalizing the posterior distribution on a problem with just 10 different species would require computing a couple million separate integrals over 18 dimensional space. The Metropolis Algorithm offers an alternative computational approach.

The Metropolis Algorithm

To implement this algorithm, you need a reversible Markov chain to propose new states from any specified given state, the ability to compute the ratio of the posterior densities (probabilities) for any pair of states, and a random number generator. (There are also ways to handle non-reveersible Markov chains.) Notice that if the posterior density at state x, p(x), equals h(x) / C where C is hard to compute, but h(x) is computable, the ratio of posterior densities at x and y equals p(x) / p(y) = h(x) / h(y). Here are the basic steps of the algorithm.

Choose an initial current state x₀.
Propose a new state x^* from the Markov chain beginning at the current state x.
Find R, the ratio of likelihoods of x^* over x.
Generate a uniform random number U.
If R is greater than U, the proposed state becomes the current state. Otherwise, the current state is recorded an additional time as the current state.
Return to step 2.

This algorithm is repeated for a very long simulation run. The distribution of the final state will be approximately that of one chosen randomly from the posterior distribution.

In practice, if you are interested in a parameter value such as the proportion of contingency tables with the same row and column sums as a given table with a chi-square statistic greater than that of the given table, the long-run proportion of tables in the sequence will converge to this number.

Last modified: May 2, 1997

Bret Larget, larget@mathcs.duq.edu