Saturday, June 13, 2009

Bayes 5: more on beta distribution

There are some useful properties of the beta distribution which simplify the math for Bayesian analysis, as discussed in Bolstad, Introduction to Bayesian Statistics.

The mean and variance of a beta distribution with shape parameters a and b are easily shown to be:

m = a / a+b
Var = a*b / ((a+b)**2 * (a+b+1))

Probably the most important beta distribution is the one which corresponds to a uniform prior, with equal probability over all intervals of the same size between 0 and 1. This distribution has a = 1, b = 1, and has m = 1/2, Var = 1/12.

Other beta distributions weight the probability to one side of the interval, e.g. a = 1, b = 2 gives a linear pdf with density = 2 at x = 0, decreasing to density = 0 at x = 2.

Another helpful property is that the beta distribution can be approximated by a normal distribution of the same mean and variance for sufficiently large a and b. This allows the use of standard tables to calculate the probability density over different intervals. The density can also be calculated exactly.

The likelihood function for a binomial proportion (n trials, y successes, probability of success p) has the form:

p**y * (1-p)**(n-y)

This is a beta distribution. If we have a beta(a,b) distribution for the prior, and then observe y successes in n trials, there is a simple updating rule which generates the posterior probability:

p**(a-1+y) * (1-p)**(b-1+n-y)

This is a beta(a+y,b+n-y) distribution. The case of the uniform prior beta(1,1) gives a posterior

p**(y) * (1-p)**(n-y)

This is a beta(y+1,n+1) distribution.

There is a simple interpretation. Choosing a and b for the prior distribution amounts to assessing the likely number of successes = a and failures = b in a+b trials. After actually observing y successes and n-y failures in n trials, we have an updated beta posterior with an easily computed mean and variance as shown above.

If we choose a uniform prior a = b = 1, then the expression for the prior is:

beta(a,b) = p**(a-1) * (1-p)**(b-1)
= 1

and the posterior is simply equal to the likelihood divided by the marginal probability. So for a uniform prior, the Bayesian approach gives the same answer for the mean as a maximum likelihood approach.

(Note to the reader: I am really at the limits of my understanding in these posts. I'm writing them as much for me as for you. So, if you find an error, please leave a comment. Thanks. And caveat lector.)

Bolstad (and James Curran) have made an R package avaiblable (link). I'll show two examples. Suppose we observe 5 successes in 15 trials. In one case we have a uniform prior and in the other we choose a beta(1,4) prior, which is weighted more toward the low values for p.


Posterior Mean           :  0.3529412 
Posterior Variance : 0.0126874
Posterior Std. Deviation : 0.1126385


Posterior Mean           :  0.3 
Posterior Variance : 0.01
Posterior Std. Deviation : 0.1

We can see that the uniform prior gives a slightly fatter distribution, while the prior weighted to low values shifts the mean from 0.35 to 0.3. But with this much data (15 trials) the influence of the prior is really quite small.