## Thursday, December 16, 2010

### Python generators

If you want to know about generators, I don't find the docs all that helpful, but you could check the PEPs, like this one.

Below is possibly the world's simplest Python generator: it's a function that includes the `yield` statement in a loop. It takes a number as the single argument and increments it once for each pass through the loop. The loop is endless: `while True`, but that's not a problem because we ask for one value at a time. Eventually we tire of the game and kill the process.

 `def g(n): while True: n += 1 yield n f = g(13)print fx = f.next()print xwhile x <= 17: x = f.next() print x`

 `1415161718`

Notice that if we really wanted the test to work properly, we should reverse the order of the statements in the second while loop.

We can extend this idea to the problem of finding prime numbers---one of the oldest computational problems. In the code below we cache and then serve up the first four values. After that, the algorithm is that we test each prime number p in turn against the candidate n:

first, construct p*p
if p*p == n, n is not prime, it's the square of a prime
elif p divides n evenly, n is not prime
elif p*p > n, n is prime
else: test the next p

If n passes all the tests, it joins the club. Since all the primes beyond 5 end in 1, 3, 7, or 9, we test no other candidates.

I did three runs, a short one with v (verbose) = True to help test our logic, and longer ones with N = 10000 and N = 100000.
We grab N primes and look at the distribution.

 `testing 11 : 2 3 5 primetesting 13 : 2 3 5 primetesting 17 : 2 3 5 primetesting 19 : 2 3 5 primetesting 21 : 2 3 is a factortesting 23 : 2 3 5 primetesting 27 : 2 3 is a factortesting 29 : 2 3 5 7 primetesting 31 : 2 3 5 7 primetesting 33 : 2 3 is a factortesting 37 : 2 3 5 7 primetesting 39 : 2 3 is a factortesting 41 : 2 3 5 7 primetesting 43 : 2 3 5 7 primetesting 47 : 2 3 5 7 primetesting 49 : 2 3 5 7 square of primetesting 51 : 2 3 is a factortesting 53 : 2 3 5 7 11 primetesting 57 : 2 3 is a factortesting 59 : 2 3 5 7 11 primetesting 61 : 2 3 5 7 11 primetesting 63 : 2 3 is a factortesting 67 : 2 3 5 7 11 primetesting 69 : 2 3 is a factortesting 71 : 2 3 5 7 11 prime{1: 5, 2: 1, 3: 5, 5: 1, 7: 5, 9: 3}[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53]{1: 2484, 2: 1, 3: 2515, 5: 1, 7: 2508, 9: 2491}{1: 24967, 2: 1, 3: 25007, 5: 1, 7: 25015, 9: 25009}`

Notice that the probability of ending in 1, 3, 7, or 9 is essentially identical. It would be interesting to look at the patterns in more detail, for example within a particular decade do 3 and 7 tend to cluster, etc.

 `v = Falsedef primes(): pL = [2,3,5,7] for i in range(4): yield pL[i] base = 0 while True: base += 10 for i in [1,3,7,9]: n = base + i if v: print 'testing', n, ':', for p in pL: if v: print p, prod = p*p if prod > n: if v: print 'prime' pL.append(n) yield n break if prod == n: if v: print 'square of prime' break if not n % p: if v: print 'is a factor' breakdef test(N): p = primes() L = [p.next() for i in range(N)] return LL = test(100000)D = dict(zip([1,2,3,5,7,9],[0]*6))for n in L: x = n % 10 D[x] += 1print Dif v: print L[:16]`

It's seems likely that this is not efficient. The tests for 2 and 3 seem unnecessary. We should just test whether n is even, whether the sum of the digits is divisible by 3, and never test 5 at all. There are other divisibility rules as well (wikipedia).

If you want an efficient prime number generator in Python, go here. Although unutbu is awesome, you should look at Alex Martelli's answer, which is short and awfully fast.

[UPDATE: I found several errors in an earlier version of this post.]