Thursday, June 11, 2009

Bayesian analysis, part II

In Dennis Lindley's book, Understanding Uncertainty, he often uses the example of a jar or urn which contains some number of balls (say, 100). The balls are marked---red or white---with an unknown proportion of each.

Consider two urns, each having some known number of red balls; in the interesting case these are not the same. We do not know whether the particular urn before us is U1 or U2, and we wish to determine the probability that it is U1 or alternatively, U2.

We observe some data, e.g. we draw balls one by one from the urn. (We do this with replacement). We infer the probability that the particular urn is in fact U1, P(U1), and by subtraction calculate P(U2) which equals 1 - P(U1). These values naturally will change as we acquire data.

In a Bayesian analysis, we must first assign a value to P(U1) before we see any data. This value is called the prior. For example, lacking any relevant information, we might assign a uniform prior in which P(U1) = P(U2).

Suppose we know that the proportion of red balls in urn U1 is 66 out of a total of 100, i.e.:

P(r | U1) = 2/3

Suppose also that

P(r | U2) = 1/3

Now, we draw one ball from the urn before us and observe that it is red. What is P(U1) after observing this data? Following Bayes rule or theorem we write:

P(U1 | r) = P(r | U1) P(U1) / P(r)

P(U1) is the prior probability of U1.

As mentioned above, if we don't have any information about which urn is being sampled, a typical Bayesian analysis would assign P(U1) = 1/2. That is, our belief about which urn is chosen before seeing any data is P(U1) = P(U2) = 1/2. Assigning the same probability to all possible models is described as using a "flat" or non-informative prior.

P(r | U1) is the probability of observing this data if the urn is really U1. This is referred to as the likelihood of U1. It is the probability of the observed data, given the model U1.

The denominator of the expression is more complicated than it appears at first. P(r) is the probability of observing a red ball under all possible models.

P(r) = P(r | U1) P(U1) + P(r | U2) P(U2)

Plugging into the main equation:

P(U1 | r) = P(r | U1) P(U1) / P(r | U1) P(U1) + P(r | U2) P(U2)

Let's do the math. Based on the known populations of the two urns, the likelihoods are:

P(r | U1) = 2/3
P(r | U2) = 1/3

The prior is:

P(U1) = P(U2) = 1/2

The marginal probability P(r) is:

P(r) = P(r | U1) P(U1) + P(r | U2) P(U2)
= 2/3 * 1/2 + 1/3 * 1/2
= 1/3 + 1/6
= 1/2

The result we seek is:

P(U1 | r) = P(r | U1) P(U1) / P(r)
= 2/3 * 1/2 / 1/2
= 1/3 / 1/2
= 2/3

and

P(U2 | r) = 1 - P(U1 | r)
= 1 - 2/3
= 1/3

The result P(U1 | r) is called the posterior probability that the urn is U1, having seen the data of a red ball.

Before seeing any data, we assessed the prior probability P(U1) as equal to 1/2. After obtaining a red ball drawn from the urn being tested, we have an increased estimate of P(U1)---the probability that the urn is the one having a higher proportion of red balls---P(U1) = 2/3 and likewise we decrease P(U2) to 1/3.

What drives non-Bayesians crazy about this analysis is the dependence of the estimated probability on the prior. For example, we might have had a prior belief that U2 was twice as likely as U1 and thus might now have P(U1) = P(U2).

Prior beliefs depend on the individual, and in this system they are immune from criticism. However, a striking result seen in doing realistic Bayesian calculations is that observed data quickly overwhelms the influence of the prior. The prior only makes a difference when data are scarce.

Suppose we now draw a second ball from the urn being tested, and find that it is again a red ball. What happens to P(U1)?

First, we will have a new prior, which now depends not only on the probabilities assigned initially, but also the adjustments made as a result of the previously observed data. The new prior = the old posterior.

P(U1) = 2/3; P(U2) = 1/3

The likelihoods are still the same:

P(r | U1) = 2/3
P(r | U2) = 1/3

Finally, the marginal probability has also changed, because we have updated P(U1) and P(U2).

The marginal probability P(r) is now:

P(r) = P(r | U1) P(U1) + P(r | U2) P(U2)
= 2/3 * 2/3 + 1/3 * 1/3
= 4/9 + 1/9
= 5/9

The posterior probability P(U1 | r):

P(U1 | r) = P(r | U1) P(U1) / P(r)
= 2/3 * 2/3 / 5/9
= 4/9 / 5/9
= 4/5

and

P(U2 | r) = 1 - P(U1 | r)
= 1 - 4/5
= 1/5

There are some interesting twists of this story that I would like to pursue in another post. It turns out that we would obtain the same result if we had instead taken the two data observations as one compound event. Another thing is that updating the posterior is done even more simply by using the likelihood ratio P(r | U1) / P(r | U2). And last, if the second ball drawn had been white the posterior probability P(U1) would go back to 1/2.

(I updated this post with a different set of probabilities, which I hope will be a clearer example).