## Tuesday, March 8, 2011

### 16S rRNA V regions, continued

As promised last time (here), this is the code to plot V regions from the sequences in the Enterobacteriales, obtained from RDP. If you compare to this post from the other day, you'll see we match pretty well.

The figure looks better with a wider window, but the limits are established more accurately with a smaller window. The threshold T needs to be adjusted based on the window size.

One detail: I did the redirect below to save the extreme values and then parsed them with the code at the end of the post.

 `python script.py > extreme_values.txt`

`script.py`

 `import utils as utfrom info import shannonimport matplotlib.pyplot as pltdata = ut.load_data('entero.txt')data = data.strip().split('>')[1:]EC = [e for e in data if 'X80725' in e][0]EC = ut.clean_fasta(EC)[1]data = [ut.clean_fasta(e)[1] for e in data]L = ut.make_count_list(data)pos = 0R = range(1,1451)iL = list()for i,c in enumerate(EC): if c == '-': continue pos += 1 if not pos in R: continue cD = L[i] e = shannon(cD,'ACGT') iL.append(e) #print str(pos).ljust(3), c + ' ', #print ''.join([str(cD[k]).ljust(5) for k in 'ACGT']), #print round(e,2)aL = list()w = 20T = 1.8for i in range(len(iL)): j, k = i - w, i + w + 1 if j < 0: j = 0 if k > len(iL): k = len(iL) m = ut.mean(iL[j:k]) if m < T: print i+1 aL.append(m) plt.plot((1,1451),(T,T),lw=2,color='r',zorder=0)plt.scatter(R,aL)ax = plt.axes()ax.set_xlim(-5,1455)ax.set_ylim(0.8,2.05)plt.savefig('example.pdf')`

`analyze.py`

 `import utils as utdata = ut.load_data('extreme_values.txt')L = [int(n) for n in data.strip().split('\n')]current = L.pop(0)print current, '-',while L: next = L.pop(0) if next != current + 1: print current print next, '-', current = nextprint next`