## Monday, November 30, 2009

### Timing complementary DNA

I received an email the other day from someone in Tunisia, letting me know that he is also posting about Python and Bioinformatics (example). This particular post is about the very simple problem of generating the "bottom" strand from the "top" strand, i.e. substituting each complementary base in a list or a string representatation of a DNA sequence. There are several ways to do this, and I decided to test them for speed, recognizing of course that "premature optimization is the root of all evil" (Knuth is cited here but not referenced).

I used the standard Python module timeit. The class initializer takes two strings of "stmt" or code, and "setup", and has a method of the same name that takes the number of repetitions as an argument.

I tested four approaches:

`if ... elif ... elif ... elif`
• a list comprehension with a dictionary lookup
• sequential string replacement `dna = dna.replace(n1,n2)`
• use of the string module method `maketrans`

The results for 10,000 reps were as follows:

 `method time ratioif elif 5.427 108.9dict 2.026 40.6replace 0.463 9.3trans 0.05 1.0`

 `import timeits = '''import string,randomL = list('ACGT')*250L2 = list()random.shuffle(L)D = {'A':'T','C':'G','G':'C','T':'A'}dna = ''.join(L)'''c1 = '''for nt in L: if nt == 'A': L2.append('T') elif nt == 'C': L2.append('G') elif nt == 'G': L2.append('C') elif nt == 'T': L2.append('A')'''c2 = '''L2 = [D[nt] for nt in L]'''c3 = '''dna = dna.replace('A','T')dna = dna.replace('C','G')dna = dna.replace('G','C')dna = dna.replace('T','A')'''tt = '''tt = string.maketrans('ACGT','TGCA')'''c4 = '''string.translate(dna,tt)'''results = list()for i,c in enumerate([c1,c2,c3,c4]): if i == 3: s = s + tt t = timeit.Timer(c,s) results.append(t.timeit(int(1E4)))m = min(results)for r in results: print round(r,3), print str(round(r/m,1)).rjust(6)`