Sunday, December 19, 2010

DNA binding sites 4



I grabbed the data for crp and purR sites in E. coli from George Church's server (here). This is the first part of the crp data set:

>aldB -18->4
attcgtgatagctgtcgtaaag
>ansB 103->125
ttttgttacctgcctctaactt
>araB1 109->131
aagtgtgacgccgtgcaaataa
>araB2 147->169
tgccgtgattatagacactttt


The graphics above are representations of the information analysis for crp and purR produced by site_score.py. Notice the different patterns. The important bases for crp are two short pentamers separated by one turn of the helix. Not so for purR, which suggests a different mode of binding.

According to (Schumacher 1994 PMID 7973627):

The DNA-binding domain contains a helix-turn-helix motif that makes base-specific contacts in the major groove of the DNA. Base contacts are also made by residues of symmetry-related alpha helices, the "hinge" helices, which bind deeply in the minor groove. Critical to hinge helix-minor groove binding is the intercalation of the side chains of Leu54 and its symmetry-related mate, Leu54', into the central CpG-base pair step. These residues thereby act as "leucine levers" to pry open the minor groove and kink the purF operator by 45 degrees.


It's that minor groove interaction that is giving the strong signal in the "middle" of the site.

Here is what we calculate for site scores for crp. Notice that the lac site is a relatively poor one:

$ python site_utils.py crp
tnaL 20.2
nupG2 18.7
lac 17.6
cdd 17.3
deoP2 16.7
malT 16.5
..
cya 12.8
..
crp 12.0
..
lac 8.8
..

avg for 100000 random seqs: -15.64
13.16
12.18
11.87
11.51
10.78
10.66
10.58
10.56
10.2
9.87
158 sites in random seq for cutoff = 5