Python for Bioinformatics: Roman numerals

Monday, November 9, 2009

Roman numerals

I'm reading Mark Pilgrim's new version of Dive Into Python, and he spends some time with Roman numerals as an example for regular expresssions. (In fact, I like it, and him, so much I bought a dead trees copy). But the example piqued my interest. It's the sort of problem Python does really well. And it has the advantage of having a forward and backward transformation (to standard representation), which allows one to test all valid cases. Not too often you see that in a programming problem. It is a good challenge for a beginning Python coder.

  928     CMXXVIII    928
 1070         MLXX   1070
   30          XXX     30
  581       DLXXXI    581
 1976     MCMLXXVI   1976
  169        CLXIX    169
 1940        MCMXL   1940
  975       CMLXXV    975
  204         CCIV    204
 1044        MXLIV   1044

I know that code is boring, but I've decided to put it up (all 70 lines of it), in four parts. The strategy here is to separate the task in two sub-tasks: conversion between decimal and "verbose" Roman, and "compression."

The first part does what the comment says. No error checking:

def compress(s):
    # convert 'X'*4 to 'XV', etc.
    L = list('IVXLCDM')
    c = s[0]
    i = L.index(c)
    n = len(s)
    if n < 4:      return s
    if n == 4:     return c + L[i+1] 
    if n == 5:     return L[i+1]
    if 5 < n < 9:  return L[i+1] + c*(n-5)
    if n == 9:     return c + L[i+2]

It's a matter of personal taste, but I tend to use single letter variable names when I can. The clutter bothers me more than the chance for confusion. But I'm consistent (L is always a list, s a string, i an index). Actually, you don't need a list here, a string would be fine. But I wanted to use L since s was already claimed :)

The second part does some trivial error checking. Then it breaks down a decimal year into thousands, hundreds, etc, and adds the appropriate number of 'IXCM' to a list, after running them through compress. Rather than check if it's needed, everything goes through.

def decimalToRoman(n):
    assert type(n) == type(1),'not an int'
    assert 0 < n < 10000,'out of bounds %i' % n
    divs = [1000, 100, 10, 1]
    vals = list()
    for i in range(4):
        d = divs[i]
        vals.append(n/d)
        n = n%d
    rL = list()
    rNumerals = 'MCXI'
    for i,v in enumerate(vals):
        c = rNumerals[i]
        s = c * v
        if i and v:  s = compress(s)
        rL.append (s)
    return ''.join(rL)

The third function does the reverse transformation. It has a nested function definition for the reverse of the "compress" algorithm. There is a second function necessitated by the fact that there may be more than one compression in a given Roman numeral. (A bug in the first version!). In "workToDo", we return -1 when there is no more work.

def romanToDecimal(s):
    D = { 'I':1,  'V':5,
          'X':10, 'L':50,  
          'C':100,'D':500, 'M':1000 }
    def resolve(s,i):
        c = s[i]
        first = s[:i]
        last = s[i+2:]
        if s[i+1] in 'VLD': x = 4
        else:               x = 9
        s2 = first + c*x + last
        return s2    
    def workToDo(s):
        for i in range(len(s)-1):
            if D[s[i]] < D[s[i+1]]:
              return i
        return -1
    i = workToDo(s)
    while i != -1:
        s = resolve(s,i)
        i = workToDo(s)          
    n = 0
    for c in s:  n += D[c]
    return n

Finally, the test harness. We print a few random numbers for fun, and then test all the years from 1 to 10,000, going from decimal to Roman and back again.

def test():
    for i in range(10):
        d = random.choice(range(1,2000))
        print str(d).rjust(5),
        r = decimalToRoman(d)
        d = romanToDecimal(r)
        print r.rjust(12),
        print str(d).rjust(6)

    for d in range(1,10000):
        r = decimalToRoman(d)
        s = romanToDecimal(r)
        if not d == s:
            print 'error'
            print str(d).rjust(5), 
            print r.rjust(12), 
            print s.rjust(6)
test()