Python for Bioinformatics: Simple Bayes example

Saturday, October 31, 2009

Simple Bayes example

I want to explore a device used by Durbin which they call the "occasionally dishonest casino," and use that to get into Hidden Markov Models. I'll explain the device next time. But the casino example is also a wonderful use of Bayes in action, so I want to talk about that again. I'm going to start by going through an elementary example that I used for my Bioinformatics class.

The example involves cookies. (Everybody loves cookies, right?) So imagine that we obtain two identical cookie jars, which we will call A and B. In the first jar, we put 20 chocolate chip cookies (just like my sister makes), and also 20 of some yucky store-bought brand, for a total of 40. The second jar (outwardly identical to the first) also contains 40 cookies, but the proportions are different, only 1/4 of them are the kind I prefer.

Now suppose I show you a single jar (you can even hold it, as long as you don't peek inside). I carefully draw out a cookie---it's a chocolate chip! :)

The question is: does this new information that we obtained a CC cookie in the draw help us to guess which jar we have in front of us? According to the Bayesian system it does. You would guess (at least if you reason like me) that before the draw there was an equal chance that I had chosen the A or the B jar. Or let's just say that I let you pick at random. We symbolize this as P(A|CC), the probability that the jar is A, given that the cookie was a chocolate chip. Since the proportion of CC in jar A is higher than in B, it seems reasonable to expect that P(A|CC) is now more than 1/2. How much more?

As before, we can set up a frequency table which includes the marginal probabilities. Before the draw, we have:

We reason as follows. We have observed a chocolate chip cookie in the draw. Initially possible observations in the second row of the table were not obtained, so we draw a line through that row, and recalculate the total. There are 30 CC cookies in total. Since 20 of these are in jar A, assuming the jar was chosen randomly, the probability that the jar is A = 20/30.

My nephew Evan doesn't like this reasoning. But I ask him (and you to consider), what if we knew in the beginning that the B jar contained zero CC cookies? It would be pathological not to update our probabilities in that case. So, what if 39/40 were CC in jar A? It's a slippery slope, and we're sliding.

As Ernst Blofeld tells James Bond: "They have a saying in Chicago..."

Next time: a full Bayesian analysis of the cookie problem.