Tuesday, July 28, 2009

Gaussian normalizing constant

Last time we followed a derivation for the normal or Gaussian distribution that gave us the general form:

p(x) = A exp { -x2/2V }


What we need to do now is to evaluate the constant A. As I'm sure you remember, it has something to do with π, more accurately with √(2π). The constant doesn't really change the distribution (e.g. the fraction of the density that lies between x = +σ and x = +∞). What the constant does is to "normalize" the values so that the total density over the range of x (from -∞ to +∞) to equals one, a requirement for a real probability density function. It squishes the graph in the y-dimension. It is probably obvious that, if the total value (∫ exp { -x2/2V } over the whole range = 1/A, then multiplying by A will normalize it.

The V in the formula above turns out to be the variance of the distribution. It's usually written as σ2 but I want to leave it as it is. Let us start with V = 1 and then explore the influence of V separately. That means σ also equals one.

We can evaluate the constant A easily using R. Since the function is symmetrical, we can integrate between 0 and +∞, and then multiply the result by two. We're going to do two things to simplify our lives. First, rather than integrate, we will sum a bunch of small intervals. Second, rather than go to +∞, we will only go as far as there is significant density. Remembering the rules for z-scores, you can guess that we'll get 99% accuracy if we go out to 3*σ. Let's aim even higher and go to 5*σ.

T = 5
I = 0.1
x = seq(0,T,by=I)
x = x + I/2
x = x[-length(x)]
head(x)


> head(x)
[1] 0.05 0.15 0.25 0.35 0.45 0.55


We specify the intervals using the variable I. Here I = 0.1, but you can make it smaller if you wish. We construct a vector v which has the values between 0 and T with a spacing of I = 0.1. The sums are more accurate if we bump out all the values in the vector by I/2 (and then remove the last one, which is outside the interval we want to measure. A single interval looks something like this.




We approximate the area of the triangle on the left (red side) by the area of the triangle on the right. We calculate the vector of areas of all the slices and store the result in y, then sum over y. We multiply by 2 to adjust for the symmetrical values for -x.

y = I * exp(-0.5*x**2)
k = 2*sum(y)
k
sqrt(2*pi)


Because we know the true value, we can compare our answer it.

> k
[1] 2.506627
> sqrt(2*pi)
[1] 2.506628


We can also generalize the above code by explicitly including V. Trying different values of V, it is clear that the constant k is actually √(2 π V), and the A in the density function is 1/k. Note the use of √V = σ in the definition of T.

V = 16
T = 5*sqrt(V)
I = 0.1
x = seq(0,T,by=I)
x = x + I/2
x = x[-length(x)]
y = I * exp(-0.5*x**2/V)
k = 2*sum(y)

sqrt(2*pi)
sqrt(2*pi*V)


> k
[1] 10.02651
> sqrt(2*pi)
[1] 2.506628
> sqrt(2*pi*V)
[1] 10.02651


Dan Teague's derivation (pdf) evaluates A using calculus. I just wanted to indicate the way π creeps in. Here are some equations from him:



The double integral over x and y (equation 3) is converted to an integral in radial coordinates. The ∫ θ then contributes the π And since the conversion to radial coordinates involved x times y, going back to p(x) gives us √π. I don't know enough to be sure this is OK, but it certainly looks reasonable.