But let's start by considering a thought experiment given by Peter Norvig in one of his outstanding posts:
...suppose I have a test machine that determines whether the subject is a flying leprechaun from Mars. I'm told the test is 99% accurate. I put person after person into the machine and the result is always negative. Finally one day, I put Tom Hanks into the machine and the light comes on that says "flying leprechaun." Would you believe the machine? Of course not: that would be impossible, so we conclude that we just happened to hit the 1% where the test errs. We find it easy to completely reject a test result when it predicts something impossible (even if the test is very accurate); now we have to train ourselves to almost completely reject a test result when it predicts something almost completely impossible (even if the test is very accurate).
Or consider a second example (from Andrew Moore): we wish to determine the relationship between headache and flu (for me). We're considering a universe of possible "events", or an event space, with a total probability of 1. In this universe, it sometimes happens that when I wake up I have a headache. Let us suppose that occurs 10% of the time. Represent this event by the cicle in the event space labeled A in the figure. (I know it's more than 10% of the total, I plead artistic license). It also happens that on occasion I have the flu. The probability of this event is represented by the size of the pink circle. Sometimes I have both flu and headache, labeled A+B and this intersection of the two circles is colored magenta.
We want to know the predictive value of headache for whether I am likely to have the flu, given that it is already known that I have a headache. That is, what is P(flu | headache). The value we seek is the ratio of two areas, the area of the region labeled A+B to the area of the circle labeled A. It is important to notice that just knowing the values for P(A) and P(B) is not enough. We also need to know the relationship between the two circles, how much they overlap.
The other thing to notice is that
P(B | A) = P(A+B) / P(A)
If we know A is true, then the event space of true things with total probability = 1 shrinks to the circle labeled A, and within that restricted event space, the probability of A+B is the ratio of the
of the magenta to the maroon circle. Rearranging:
P(A+B) = P(B | A) P(A).
Since there is nothing special about the labels or the sizes of the circles it is also true that
P(A+B) = P(A | B) P(B).
Now we have two expressions both equal to P(A+B), so they are also equal to each other:
P(A | B) P(B) = P(B | A) P(A)
P(A | B) = P(B | A) P(A) / P(B) |
This is the standard form of Bayes rule.
Let's try to get real. Suppose there is a medical condition or disease (D), together with its complement, no disease (~D). There is also a diagnostic test which has characteristic (known) error rates: the rate of false-positives (FP) and the rate of false-negatives (FN).
In order to interpret our test results, these two values are not sufficient. In Bayesian terms, what we want to know is P(D | +), the probability of disease given a positive test result. We want to know the probability of a hypothesis, given some data. (Notice this is the opposite of the way you first learned about probability, where we predict what data we will see, given a hypothesis). Going back to the equality statement above and switching notation, we have
P(D | +) = P(+ | D) P(D) / P(+)
We already know P(+ | D), it is 1 - FN. We haven't talked about either P(D) or P(+) yet. P(D) is the probability or incidence of disease (think back to the flying leprechaun). P(+), the probability of a positive result, looks hard to calculate. But in this case it's easy. Either we have the disease or we do not. In either case, we may have a positive result. The total probability of being positive is the sum of these two events:
P(+) = P(+ | D) P(D) + P(+ | ~D) P(~D)
P(D | +) = P(+ | D) P(D) / P(+ | D) P(D) + P(+ | ~D) P(~D)
Again, we are given P(+ | D) as 1 - false-negative rate.
We are given P(+ | ~D) as false-positive rate.
We know P(D), the incidence of disease.
We calculate P(~D) as 1 - P(D).
Rather than do the math explicitly, I will use another device called a contingency table. Suppose false-negatives = false-positives = 1 %. And suppose we know the incidence of disease is also 1 %. If we consider 10000 individuals, we can calculate as follows:
Look at the sums in the third column. The first two are the marginal probabilities P(+) and P(-) (after dividing by 10000). Notice that P(D | +) = 0.5. The probability of disease given a positive test result is only 50 %. This value depends on P(D) as explained above. In particular, if P(D) is 0.01 %, then nearly all positive test results will be from individuals without disease. Everything depends on the probability of actually encountering a flying leprechaun.
To learn more about Bayes, I could recommend that you start as I did by going to Wikipedia, but I find their "introductory" articles on mathematical topics very hard going. Instead, I suggest that you start with a book by Dennis Lindley entitled Understanding Uncertainty. If you need to be convinced that he is worth the effort, you can thumb through the book on Amazon, or you can check out:
Inference for a Bernoulli Process (A Bayesian View)
Author(s): D. V. Lindley and L. D. Phillips
Source: The American Statistician, Vol. 30, No. 3 (Aug., 1976), pp. 112-119.
If you can access jstor, get it here.