ScoreList.c
, which implements a function for constructing a list of scores from a list of nucleotide counts for each position, and a second function which reads such a score list in from disk.The logic of the program was explained in the previous post. I won't spend much time explaining the details here.
Let's just say that if this blog has a motto it is: "learn by doing."
So... get in there and try writing your own version. If you get stuck, you can see how (or if) I handled the problem.
A few other notes: I tested the program by using a list of counts for crp sites obtained as described (here). The output shows the position, score and sequence of crp sites in the E. coli genome. The first few (threshold = 12) are:
After implementing the algorithm (which handles the sequence one character at a time) I realized that it would be good to have the site sequence available at the time of printing (as shown above). This required remembering the current site's sequence, which I grafted onto the program at the very end. Luckily it didn't change the running time by much. I think now that while the circular linked list seemed like an elegant approach, it caused as many problems as it solved, and I probably wouldn't do it again.
You can tell just from looking at the patterns that we are getting good matches to the crp consensus [T/A]3TGTGAN6TCACA[T/A]3. Also, the last site is the lac 1 (lacZ) site---reversed. So I think it's working OK.
Finally, it runs in less than 2 seconds! That's a huge improvement on the previous time. Zipped files on Dropbox (here).