Python for Bioinformatics: Gamma distribution

Tuesday, August 18, 2009

Gamma distribution

I posted previously about the beta distribution here, here and here. We ran into it because it is the conjugate prior for Bayesian analysis of problems in binomial proportion. That's because the likelihood function is in the same form.

The beta distribution is a continuous probability distribution with two shape parameters α and β (or a and b). If we consider p as the random variable (and q = 1-p), then the unnormalized distribution is just:

f(x) = p^a-1 q^b-1

The function has symmetry: if we switch a and b, and also p and q, everything would look the same. Varying the values for a and b can yield a wide variety of shapes. For large a and b, the plot approaches a normal distribution with mean = a / a+b.

The need to normalize the beta distribution brings us to the gamma distribution. This one can be viewed in a simple way but also can be fairly complex. The simple version is that, for positive integers, Γ(x) is just (x-1)!.

The normalizing constant for the beta distribution with parameters a and b is:

Γ(a+b) / (Γ(a) * Γ(b))

The gamma distribution is frequently used in phylogenetic models, for example, to model the distribution of variability during evolutionary time among different positions in a protein. The gamma distribution also has two parameters but these apparently have quite different roles.

They are called the shape parameter (k) and the scale parameter (θ). More generally, the unnormalized gamma distribution is:

f(x) = x^k-1 exp { -x / θ }

We can get an idea of the roles of the two variables by considering this:

mean = kθ
variance = kθ²

Let's look at some plots obtained using different values for the two parameters. In the series below, k goes from 1 to 4, and then for each plot theta varies from 0.25, 0.5, 1, 2, 3 as the color goes from blue to green to magenta to red to maroon.

N = 6
thetaL = c(0.25,0.5,1,2,3)
color.list = c('blue','darkgreen',
  'magenta','red','darkred')

#k = 1,2,3,4

plot(1:N,ylim=c(0,0.8),type='n')
for (i in 1:5) {
  curve(dgamma(x,
    shape=k,scale=thetaL[i]),
    from=0,to=N,lwd=3,
    col=color.list[i],add=T)  }