The scoring method is coded and described here, and comes from Lawrence et al (PMID 8211139).
The bottom line that I've come to in these simulations is that the simple version of the algorithm works pretty well. Of course, the results depend on the values chosen for the parameters above. Decreasing M and increasing SZ decreases the signal to noise ratio. I haven't played with N much, but I think more sequences should be better. Early trials led to setting R = 100---most cycles reached a plateau by 60 or so (these were tested with small values for M and SZ). I've changed the setting for T in relation to M and SZ in order to obtain positive results. It is probable that lower values of T might also show success.
Here is a table of my results. M, SZ and R are introduced above. FD refers to the frequency distribution for the motif, shown here. ratio refers to the fraction of the top 15 hits that are related to the motif. What it means to be related will be discussed later. A final point to note is that these results are "anecdotal" because they were all obtained with a single seed for the random number generator. Some tests were done with other seeds, but not a lot. So you can view this as a proof of principle.
Probably the most stringent test is for a motif of 9 nt embedded in sequences of SZ = 2000. Following is the output for that test. The first part is the list of the top 15 results, the index list of the best motif, and its score. The second part shows the motifs in more detail. There is a discrepancy in the scoring, it is due to the fact that in the first section, scores were calculated by excluding the sequence currently in slide mode, whereas in the second part I calculated the score by averaging the results obtained by treating every sequence in the motif as in slide mode. The test runs pretty slowly, but this is Python, after all.