## Friday, June 12, 2009

### Bayes IV: beta distribution

The second book I recommend to help further understanding of Bayesian methods is William Bolstad, Introduction to Bayesian Statistics.

I'm planning a series of posts explaining my low-level understanding of what he covers in the book. But first, I'm going to discuss some probability distributions or pdfs (probability density functions).

We can generalize the Bayesian approach as follows. Typically, we have some observed data (D) and one of a series of hypotheses or models. Often the parameters of those models are not discrete but continuous. The standard form of Bayes rule would be that the posterior ≈ likelihood x prior.

P(H | D) = P(D | H) * P(H) / P(D)

where P(D) is a complex term involving the probability of the observed data under all hypotheses. Often, rather than write H for hypothesis, theta is used as a symbol for the adjustable parameters of a general model. Where previously we summed over some discrete models to get P(D), in real-life situations we would be integrating over the parameters, some or all of which are continuous.

As I understand it, there are two general problems in applying the Bayesian approach. The first is the need to evaluate the likelihood of the model, P(D | H), and multiply it by the prior. The second is the need to integrate to determine the denominator in the equation above. These needs introduce complexities, which are in general solved by tricks.

As explained in detail in Bolstad, it is very helpful to have a function for the prior which, when multiplied by the likelihood, gives a function in the same family as the likelihood function. So, for example, in problems of binomial proportion with parameter p = the probability of success on any one trial, with y successes in n trials, the likelihood function is of the form p**y * (1-p)**(n-y)

In this case, it is enormously helpful if the prior distribution has the same form, which is called the beta distribution.

 `beta (a,b) = p**(a-1) * (1-p)**(b-1)`

where a and b are shape parameters, and p runs between 0 and 1. Here is R code to demonstrate beta distributions with various shapes, and the resulting plots.

Shape parameters: a = 1, b = 0.5,1,..7

 `x=seq(0,1,length=1001)par(pch=16)colors=rainbow(7)L=c(1,2,3,4,5,6,7)plot(x,dbeta(x,1,0.5), ylim=c(0,5),col='black')for (j in 1:length(L)){ points(x,dbeta(x,1,L[j]), col=colors[j]) }`

Shape parameters: a = b = 0.5,1,..6

 `x=seq(0,1,length=1001)par(pch=16)colors=rainbow(7)i=1L=c(1,2,3,4,5,6)plot(x,dbeta(x,0.5,0.5), ylim=c(0,5),col='black')for (j in 1:length(L)){ points(x,dbeta(x,L[j],L[j]), col=colors[j]) }`

Shape parameters: b = 2*a = 0.5,1,..6

 `x=seq(0,1,length=1001)par(pch=16)colors=rainbow(7)i=1L=c(1,2,3,4,5,6)plot(x,dbeta(x,0.5,1), ylim=c(0,5),col='black')for (j in 1:length(L)){ y=L[j] points(x,dbeta(x,y,y*2), col=colors[j]) }`