## Thursday, June 25, 2009

### Motif Discovery 10

This post will demonstrate the world's simplest Gibbs Sampler. We first construct a 2D distribution by hand. The Python code is:

 `# a very simple Gibbs samplerimport random,sys#---------------------------------# a discrete distributionR = range(10)distr = [ [1,1,2,2,1,1,1,1,1,1], [1,3,4,5,2,1,1,1,1,1], [1,4,7,6,2,1,1,1,1,1], [1,3,4,3,2,1,1,1,1,1], [1,1,1,1,1,1,1,1,1,1], [1,1,1,1,1,1,1,1,1,1], [1,1,1,1,1,2,2,3,2,1], [1,1,1,1,1,2,3,4,2,1], [1,1,1,1,1,2,2,5,3,1], [1,1,1,1,1,1,1,1,2,1]]def f(x,y): return distr[x][y]#---------------------------------# save distribution for R plotFH = open('results.txt','w')for x in R: for y in R: # convert to 1-based index t = (x+1,y+1,f(x,y)) L = [str(e) for e in t] FH.write(' '.join(L) + '\n')FH.close()`

Two plots of the values (rotate the table above to see the correspondence):

The R code to do the plots:

 `Rcodesetwd('Desktop')library(scatterplot3d)m = read.table('results.txt',as.is=T)par(las=1)scatterplot3d(m, type='h',pch=16, cex.symbol=3, cex.axis=2, highlight.3d = T, xlab='x',ylab='y', zlab='z')s = m[,3]dim(s) = c(10,10)s = t(s)image(s, col=topo.colors(10))`

The sampler is simplicity itself:

 `def gibbsMove(L): L = [t[0] for t in L] S = sum(L) cumulative = 0 # weight by score r = random.random() for i,s in enumerate(L): cumulative += s f = cumulative*1.0/S if f > r: return i raise ValueError('f < r')`

Here are the top scores. The count is the number of times the sampler visited that value; the ratio of the count to the actual score (or value of the distribution at the point), is nearly constant.

 `c = count, s = score c c/s ( s, x, y)4208 601.14 ( 7.0 , 2 , 2 )3789 631.5 ( 6.0 , 2 , 3 )3239 647.8 ( 5.0 , 1 , 3 )3103 620.6 ( 5.0 , 8 , 7 )2636 659.0 ( 4.0 , 1 , 2 )2551 637.75 ( 4.0 , 7 , 7 )2465 616.25 ( 4.0 , 2 , 1 )2466 616.5 ( 4.0 , 3 , 2 )1929 643.0 ( 3.0 , 8 , 8 )1894 631.33 ( 3.0 , 6 , 7 )1901 633.67 ( 3.0 , 1 , 1 )1893 631.0 ( 3.0 , 3 , 3 )1842 614.0 ( 3.0 , 3 , 1 )1971 657.0 ( 3.0 , 7 , 6 )1315 657.5 ( 2.0 , 8 , 6 )1374 687.0 ( 2.0 , 1 , 4 )1268 634.0 ( 2.0 , 7 , 8 )1394 697.0 ( 2.0 , 6 , 8 )1363 681.5 ( 2.0 , 9 , 8 )1227 613.5 ( 2.0 , 0 , 2 )`

Here is the rest of the code needed to run the example and show the results:

 `x = random.choice(R)y = random.choice(R)L = [[f(x,y),x,y]]t = ('x','y')T = int(1E5)for i in range(T): L2 = list() if random.choice(t) == 'x': y = L[-1][2] for x in R: L2.append((f(x,y),x,y)) else: x = L[-1][1] for y in R: L2.append((f(x,y),x,y)) j = gibbsMove(L2) L.append(L2[j])#---------------------------------# show resultsD = dict()for t in L: t = tuple(t) if D.has_key(t): D[t] += 1 else: D[t] = 1 kL = sorted(D.keys(),key=lambda t: t[0])kL.reverse()def show(t): counts = D[t] score = t[0] print str(counts).rjust(4), ratio = 1.0*counts/score print str(round(ratio,2)).ljust(7), print '(' + str(round(score,1)).rjust(4), print ',',t[1],',',t[2],')'print 'c = count, s = score'print ' c c/s ( s, x, y)'for t in kL[:20]: show(t)print '-' * 10for t in kL[-20:]: show(t)`

Update: fixed a bug in "show."