(I am still worried that the scoring function I'm using, as mentioned here, is incorrect and may be screwing up the position distributions).
The take home message from playing with the sampler is that it takes a lot of iterations for the distribution to settle down where it should be. For example, here is output from a run on a toy problem:
It is tough to get separation, though admittedly most of these look like off-by-one solutions. In this problem, the number of positions to check for each sequence is SZ - M + 1 = 10. So the total number of positions is 105. With a million cycles (and discarding the first 20%), we have 8 samples per position on the average. So the sampler does seem to be finding our motif fairly efficiently (629 times here).
But it seems clear that the efficiency has to get a lot better with increasing length of the sequences. With a sequence length of 1000 and 10 sequences, the number of positions is roughly 100010 = 1030.
I am going to have to redo parts of the code in C or C++ in order to test larger sequence sets. So far, real problems do not yield to my sampler even with 5 x 106 cycles.