Home > Dive Into Python > Unit Testing > roman.py, stage 5 | << >> | ||||
diveintopython.org Python for experienced programmers |
Now that fromRoman works properly with good input, it's time to fit in the last piece of the puzzle: making it work properly with bad input. That means finding a way to look at a string and determine if it's a valid Roman numeral. This is inherently more difficult than validating numeric input in toRoman, but we have a powerful tool at our disposal: regular expressions.
If you're not familiar with regular expressions and didn't read Regular expressions 101, now would be a good time.
As we saw at the beginning of this chapter, there are several simple rules for constructing a Roman numeral. The first is that the thousands place, if any, is represented by a series of M characters.
Example 6.18. Checking for thousands
>>> import re >>> pattern = '^M?M?M?$' >>> re.search(pattern, 'M') <SRE_Match object at 0106FB58> >>> re.search(pattern, 'MM') <SRE_Match object at 0106C290> >>> re.search(pattern, 'MMM') <SRE_Match object at 0106AA38> >>> re.search(pattern, 'MMMM') >>> re.search(pattern, '') <SRE_Match object at 0106F4A8>
The hundreds place is more difficult than the thousands, because there are several mutually exclusive ways it could be expressed, depending on its value.
So there are four possible patterns:
The last two patterns can be combined:
Example 6.19. Checking for hundreds
>>> import re >>> pattern = '^M?M?M?(CM|CD|D?C?C?C?)$' >>> re.search(pattern, 'MCM') <SRE_Match object at 01070390> >>> re.search(pattern, 'MD') <SRE_Match object at 01073A50> >>> re.search(pattern, 'MMMCCC') <SRE_Match object at 010748A8> >>> re.search(pattern, 'MCMC') >>> re.search(pattern, '') <SRE_Match object at 01071D98>
Whew! See how quickly regular expressions can get nasty? And we've only covered the thousands and hundreds places. (Later in this chapter, we'll see a slightly different syntax for writing regular expressions that, while just as complicated, at least allows some in-line documentation of the different sections of the expression.) Luckily, if you followed all that, the tens and ones places are easy, because they're exactly the same pattern.
If you have not already done so, you can download this and other examples used in this book.
"""Convert to and from Roman numerals""" import re #Define exceptions class RomanError(Exception): pass class OutOfRangeError(RomanError): pass class NotIntegerError(RomanError): pass class InvalidRomanNumeralError(RomanError): pass #Define digit mapping romanNumeralMap = (('M', 1000), ('CM', 900), ('D', 500), ('CD', 400), ('C', 100), ('XC', 90), ('L', 50), ('XL', 40), ('X', 10), ('IX', 9), ('V', 5), ('IV', 4), ('I', 1)) def toRoman(n): """convert integer to Roman numeral""" if not (0 < n < 4000): raise OutOfRangeError, "number out of range (must be 1..3999)" if int(n) <> n: raise NotIntegerError, "decimals can not be converted" result = "" for numeral, integer in romanNumeralMap: while n >= integer: result += numeral n -= integer return result #Define pattern to detect valid Roman numerals romanNumeralPattern = '^M?M?M?(CM|CD|D?C?C?C?)(XC|XL|L?X?X?X?)(IX|IV|V?I?I?I?)$' def fromRoman(s): """convert Roman numeral to integer""" if not re.search(romanNumeralPattern, s): raise InvalidRomanNumeralError, 'Invalid Roman numeral: %s' % s result = 0 index = 0 for numeral, integer in romanNumeralMap: while s[index:index+len(numeral)] == numeral: result += integer index += len(numeral) return result
At this point, you are allowed to be skeptical that that big ugly regular expression could possibly catch all the types of invalid Roman numerals. But don't take my word for it, look at the results:
Example 6.21. Output of romantest5.py against roman5.py
fromRoman should only accept uppercase input ... ok toRoman should always return uppercase ... ok fromRoman should fail with malformed antecedents ... ok fromRoman should fail with repeated pairs of numerals ... ok fromRoman should fail with too many repeated numerals ... ok fromRoman should give known result with known input ... ok toRoman should give known result with known input ... ok fromRoman(toRoman(n))==n for all n ... ok toRoman should fail with non-integer input ... ok toRoman should fail with negative input ... ok toRoman should fail with large input ... ok toRoman should fail with 0 input ... ok ---------------------------------------------------------------------- Ran 12 tests in 2.864s OK
One thing I didn't mention about regular expressions is that, by default, they are case-sensitive. Since our regular expression romanNumeralPattern was expressed in uppercase characters, our re.search check will reject any input that isn't completely uppercase. So our uppercase input test passes. | |
More importantly, our bad input tests pass. For instance, the malformed antecedents test checks cases like MCMC. As we've seen, this does not match our regular expression, so fromRoman raises an InvalidRomanNumeralError exception, which is what the malformed antecedents test case is looking for, so the test passes. | |
In fact, all the bad input tests pass. This regular expression catches everything we could think of when we made our test cases. | |
And the anticlimax award of the year goes to the word “OK”, which is printed by the unittest module when all the tests pass. |
When all your tests pass, stop coding. |
roman.py, stage 4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | Handling bugs |