## Thursday, November 18, 2010

### Maximum likelihood, a first beginning

wikimedia

I need to get more serious about phylogenetics. Let's make a start on Maximum Likelihood. The first problem is to evaluate the probability of one particular hypothesis (a model) given some observed data. We're going to say eventually that the best model, the most likely model out of many possible models, is the one that gives the highest probability of generating the data we've obtained. The is the problem known as reverse probability.

Initially, let's take a very simple model, a binary proportion with `p` unknown. For example, it could be a coin toss where the coin isn't necessarily fair. Although in that case, we might have some pre-conceived limits on the value of `p`, and should think about a Bayesian approach. :)

The probability of observing `k` successes in `n` independent tries, where the order in which the successes occur is unknown or could be any order is:

 `C(n,k) pk (1-p)n-k`

where C(n,k) is the number of combinations, the number of orders of the k successes in n tries.

 `C(n,k) = n! / (k! (n-k)!)`

Suppose we observe some data, so that `k` and `n` are known, and furthermore, suppose we know the sequence of coin tosses:

 `SFSSFFFSFSSS`

The probability of observing this particular sequence of successes and failures, where `p` is the probability of success, is no longer a combinations problem, since the events are independent we just multiply the the individual probabilities:

 `p (1-p) p p (1-p) (1-p) (1-p) p (1-p) p p ppk (1-p)n-k`

This is the likelihood, L, the probability of this data with a model parameter `p`, where `p` is a variable (unknown).

 `L = pk (1-p)n-k`

To find the maximum likelihood, we should find dL/dp and set it equal to 0. But there's a nice trick, which is to observe that for any x and y, if x > y, then log(x) > log(y). Therefore the maximum of L is the same as the maximum of log(L).

 `log(L) = k log(p) + (n-k) log(1-p)d/dp log(L) = k/p - (n-k)/(1-p)0 = k/p - (n-k)/(1-p)k/p = (n-k)/(1-p)(1-p)/p = (n-k)/k1/p - 1 = n/k - 11/p = n/kp = k/n`

The maximum likelihood estimate of `p` (`p` with a little hat on it) is equal to the observed ratio of successes `k` to the number of trials `n`. Not exactly a surprise!

Technically, our model is that each coin toss is a Bernoulli trial, since for us the probability of success on the next trial is always `p`, and is independent of any earlier outcomes in the series.

We assumed we know the sequence of coin tosses but this isn't necessary. If the order is unknown or the model specifies that the events are exchangeable:

 `L = C(n,k) pk (1-p)n-klog(L) = log(C(n,k)) + k log(p) + (n-k) log(1-p)d/dp log(L) = k/p - (n-k)/(1-p)`

which is the same as before.

If you insist on differentiating L, then you'll need the product rule.

 `dL/dp = C(n,k) ( d/dp(pk (1-p)n-k )) = C(n,k) ( -pk (n-k)(1-p)n-k-1 + (1-p)n-k k pk-1)dL/dp = 0pk (n-k)(1-p)n-k-1 = (1-p)n-k k pk-1p (n-k) = (1-p) k`

Which is also the same as before.
[UPDATE: an earlier version had an error, sorry. ]