Python for Bioinformatics: Bayes 8: estimation of the mean

Saturday, July 25, 2009

Bayes 8: estimation of the mean

There is a bit more I'd like to post about my continuing attempt to understand methods for Bayesian analysis. The example I will use is (once again) taken from Bolstad's wonderful text. It is the simplest case of estimation of the population mean using a random sample: when the variance of the population is known.

Example 18. Arnie, Barb and Chuck are going to estimate the mean length of one-year-old rainbow trout in a stream. Previous studies in other streams have shown the length of yearling rainbow trout to be normally distributed with known standard deviation of 2 cm. They take a random sample of 12 yearling trout from the stream and find the sample mean y = 32 cm

As in the case of estimating a population proportion, it is very helpful if one can represent one's prior belief using an appropriate type of distribution. Previously, the conjugate distribution was the beta distribution, in the present situation it is the normal distribution. Here, because the observation or likelihood is a normal distribution, then if the prior is also a normal distribution, the posterior can be found using simple updating rules.

First, we need to find the precision, which for the prior is simply the inverse of the variance. For the observation, the precision of the mean is the number of observations times their individual precisions, calculated from the known population variance: n/σ². The updated precision is the sum of the prior precision plus the observation precision.

In the second step, we update to the posterior mean as the weighted average of the prior mean and the observation, where the weights are the proportions of their individual precisions to the posterior precision.

Arnie decides his prior mean is 30 cm. He decides that he doesn't believe it is possible for a yearling rainbow to be less than 18 cm or greater than 42 cm. Thus his prior standard deviation is 4 cm. Thus he will use a normal (30,4²) prior.

Arnie is using, as a rule of thumb, that the entire range of the distribution which has any appreciable density should be spanned by 3 standard deviations.

prior precision = 1/4²
observation precision = 12/2²
posterior precision 
  = 1/4² + 12/2² 
  = 0.0625 + 3.0 = 3.0625

posterior variance = 1/precision
  = 1/3.0625 = 0.3265
posterior st dev = sqrt(variance) 
  = 0.5714

The weights are as follows:

prior        0.0625/3.0625 = 0.0204
observation     3.0/3.0625 = 0.9796

The posterior mean is then:

mean = 0.0204*30 + 0.9796*32 = 31.96

Barb, on the other hand, decides she doesn't know anything about trout, and decides to use a flat prior.

A flat prior has zero precision, so her posterior has:

precision = 3.0
variance = 1/3.0 = 0.33

Her posterior mean is the sample mean = 32.

Chuck decides his prior belief is not normal. His prior has a trapezoidal shape. His prior gives zero weight at 18 cm. It gives weight one at 24 cm, and is level up to 40 cm, and then goes down to zero at 46 cm.

Poor Chuck will have to find his posterior distribution by numerical integration. Fortunately for us, we can do the same calculation using the Bolstad package in R. I'll save that for next time, but we need a graphic for the top of the post:

library(Bolstad)
help(package=Bolstad)

normnp     Bayesian inference on a normal 
           mean with a normal prior

We need several values:

x        a vector of observations
m.x  the mean of the normal prior
s.x  the standard deviation of the normal prior
sigma.x  the population std. deviation of the normal distribution

In this case, we were not given the individual observations. However, since we will plug in the population standard deviation, we can just make something up, as long as it has the observation mean = 32 and n = 12. Arnie had a normal (30,4²) prior. The population sd was known to be 2.

m.x = 30
s.x = 4
sigma.x = 2

x = rep(32,12)
normnp(x,30,4,2,ret=T)

> normnp(x,30,4,2,ret=T)
Known standard deviation :2
Posterior mean           : 31.9591837
Posterior std. deviation : 0.5714286