## Friday, August 8, 2008

### Deonier Ch 2 example 1,2

In example 2.1, Deonier want to simulate a DNA sequence by generating a random sequence of letters with some probability. Here is their code for a demo of the sample function in R:
 `x = c(1,0)pi = c(0.25,0.75)v <- sample(x,10,replace=T,pi)v# [1] 1 0 0 0 1 0 0 0 1 0`

In Python I just make a list or string with the elements inserted according to the desired probability, then use random.choice.
 `import random, mathR = random.Random(31)x = [1,0,0,0]for i in range(20): print R.choice(x),print# 1 1 0 0 1 1 1 0 1 0 0 1 0 0 0 0 1 1 0 0`

To simulate a DNA sequence, they use the integers 1 to 4, because R doesn't deal with strings or characters very gracefully.
 `set.seed(13)pi = rep(0.25,4)x = 1:4seq1 <- sample(x,10000,replace=T,pi)seq1[1:15]# [1] 4 2 3 2 1 2 4 1 1 2 4 1 1 4 4seq2 <- sample(x,10000,replace=T,pi)seq2[1:15]# [1] 1 4 4 2 1 3 4 1 3 2 3 2 3 2 1`

In Python
 `b = 'ACGT'N = int(1E3)L = [R.choice(b) for i in range(N)]print L[:6]# ['A', 'C', 'C', 'T', 'A', 'A']print L.count('T')# 259`

In example 2.2, they generate a sequence from the binomial distribution
 `x <- rbinom(2000,1000,0.25)mean(x)# [1] 250.0715var(x)# [1] 196.1775sd(x)^2# [1] 196.1775`

The math module doesn't have a mean function so we define some simple statistical functions:
 `def mean(L): return sum(L)*1.0/len(L)def var(L): m = mean(L) sumsq = sum([(n-m)**2 for n in L]) return 1.0/(len(L)-1) * sumsqdef sd(L): return math.sqrt(var(L))`

We fake the binomial distribution like this:
 `import random, mathR = random.Random(31)N = int(1E3)b = 'ACGT'x = list()obs = 2000for i in range(obs): L = [R.choice(b) for i in range(N)] x.append(L.count('T'))print '%3.2f %3.2f %3.2f' % ( mean(x), var(x), sd(x)) # 250.03 187.47 13.69`

And finally, we'd like to generate a histogram of the values (I'll plot the R results):
 `b = max(x) - min(x) + 1r = c(min(x)-10,max(x)+10)par('mar' = par('mar')+2)hist(x,breaks=b,xlim=r,xlab='successes', cex.lab=1.5,main='',col='magenta')`

Here it is: