Appendix B. A 5-minute review

Chapter 1. Getting To Know Python

1.1. Diving in
Here is a complete, working Python program.
1.2. Declaring functions
Python has functions like most other languages, but it does not have separate header files like C++ or interface/implementation sections like Pascal. When you need a function, just declare it and code it.
1.3. Documenting functions
You can document a Python function by giving it a doc string.
1.4. Everything is an object
A function, like everything else in Python, is an object.
1.5. Indenting code
Python functions have no explicit begin or end, no curly braces that would mark where the function code starts and stops. The only delimiter is a colon (“:”) and the indentation of the code itself.
1.6. Testing modules
Python modules are objects and have several useful attributes. You can use this to easily test your modules as you write them.
1.7. Dictionaries 101
One of Python's built-in datatypes is the dictionary, which defines one-to-one relationships between keys and values.
1.8. Lists 101
Lists are Python's workhorse datatype. If your only experience with lists is arrays in Visual Basic or (God forbid) the datastore in Powerbuilder, brace yourself for Python lists.
1.9. Tuples 101
A tuple is an immutable list. A tuple can not be changed in any way once it is created.
1.10. Defining variables
Python has local and global variables like most other languages, but it has no explicit variable declarations. Variables spring into existence by being assigned a value, and are automatically destroyed when they go out of scope.
1.11. Assigning multiple values at once
One of the cooler programming shortcuts in Python is using sequences to assign multiple values at once.
1.12. Formatting strings
Python supports formatting values into strings. Although this can include very complicated expressions, the most basic usage is to insert values into a string with the %s placeholder.
1.13. Mapping lists
One of the most powerful features of Python is the list comprehension, which provides a compact way of mapping a list into another list by applying a function to each of the elements of the list.
1.14. Joining lists and splitting strings
You have a list of key-value pairs in the form key=value, and you want to join them into a single string. To join any list of strings into a single string, use the join method of a string object.
1.15. Summary
The odbchelper.py program and its output should now make perfect sense.

Chapter 2. The Power Of Introspection

2.1. Diving in
Here is a complete, working Python program. You should understand a good deal about it just by looking at it. The numbered lines illustrate concepts covered in Getting To Know Python. Don't worry if the rest of the code looks intimidating; you'll learn all about it throughout this chapter.
2.2. Optional and named arguments
Python allows function arguments to have default values; if the function is called without the argument, the argument gets its default value. Futhermore, arguments can be specified in any order by using named arguments. Stored procedures in SQL Server Transact/SQL can do this; if you're a SQL Server scripting guru, you can skim this part.
2.3. type, str, dir, and other built-in functions
Python has a small set of extremely useful built-in functions. All other functions are partitioned off into modules. This was actually a conscious design decision, to keep the core language from getting bloated like other scripting languages (cough cough, Visual Basic).
2.4. Getting object references with getattr
You already know that Python functions are objects. What you don't know is that you can get a reference to a function without knowing its name until run-time, using the getattr function.
2.5. Filtering lists
As you know, Python has powerful capabilities for mapping lists into other lists, via list comprehensions. This can be combined with a filtering mechanism, where some elements in the list are mapped while others are skipped entirely.
2.6. The peculiar nature of and and or
In Python, and and or perform boolean logic as you would expect, but they do not return boolean values; they return one of the actual values they are comparing.
2.7. Using lambda functions
Python supports an interesting syntax that lets you define one-line mini-functions on the fly. Borrowed from Lisp, these so-called lambda functions can be used anywhere a function is required.
2.8. Putting it all together
The last line of code, the only one we haven't deconstructed yet, is the one that does all the work. But by now the work is easy, because everything we need is already set up just the way we need it. All the dominoes are in place; it's time to knock them down.
2.9. Summary
The apihelper.py program and its output should now make perfect sense.

Chapter 3. An Object-Oriented Framework

3.1. Diving in
Here is a complete, working Python program. Read the doc strings of the module, the classes, and the functions to get an overview of what this program does and how it works. As usual, don't worry about the stuff you don't understand; that's what the rest of the chapter is for.
3.2. Importing modules using from module import
Python has two ways of importing modules. Both are useful, and you should know when to use each. One way, import module, you've already seen in chapter 1. The other way accomplishes the same thing but works in subtlely and importantly different ways.
3.3. Defining classes
Python is fully object-oriented: you can define your own classes, inherit from your own or built-in classes, and instantiate the classes you've defined.
3.4. Instantiating classes
Instantiating classes in Python is straightforward. To instantiate a class, simply call the class as if it were a function, passing the arguments that the __init__ method defines. The return value will be the newly created object.
3.5. UserDict: a wrapper class
As you've seen, FileInfo is a class that acts like a dictionary. To explore this further, let's look at the UserDict class in the UserDict module, which is the ancestor of our FileInfo class. This is nothing special; the class is written in Python and stored in a .py file, just like our code. In particular, it's stored in the lib directory in your Python installation.
3.6. Special class methods
In addition to normal class methods, there are a number of special methods which Python classes can define. Instead of being called directly by your code (like normal methods), special methods are called for you by Python in particular circumstances or when specific syntax is used.
3.7. Advanced special class methods
There are more special methods than just __getitem__ and __setitem__. Some of them let you emulate functionality that you may not even know about.
3.8. Class attributes
You already know about data attributes, which are variables owned by a specific instance of a class. Python also supports class attributes, which are variables owned by the class itself.
3.9. Private functions
Like most languages, Python has the concept of private functions, which can not be called from outside their module; private class methods, which can not be called from outside their class; and private attributes, which can not be accessed from outside their class. Unlike most languages, whether a Python function, method, or attribute is private or public is determined entirely by its name.
3.10. Handling exceptions
Like many object-oriented languages, Python has exception handling via try...except blocks.
3.11. File objects
Python has a built-in function, open, for opening a file on disk. open returns a file object, which has methods and attributes for getting information about and manipulating the opened file.
3.12. for loops
Like most other languages, Python has for loops. The only reason you haven't seen them until now is that Python is good at so many other things that you don't need them as often.
3.13. More on modules
Modules, like everything else in Python, are objects. Once imported, you can always get a reference to a module through the global dictionary sys.modules.
3.14. The os module
The os module has lots of useful functions for manipulating files and processes, and os.path has functions for manipulating file and directory paths.
3.15. Putting it all together
Once again, all the dominoes are in place. We've seen how each line of code works. Now let's step back and see how it all fits together.
3.16. Summary
The fileinfo.py program should now make perfect sense.

Chapter 4. HTML Processing

4.1. Diving in
I often see questions on comp.lang.python like “How can I list all the [headers|images|links] in my HTML document?” “How do I [parse|translate|munge] the text of my HTML document but leave the tags alone?” “How can I [add|remove|quote] attributes of all my HTML tags at once?” This chapter will answer all of these questions.
4.2. Introducing sgmllib.py
HTML processing is broken into three steps: breaking down the HTML into its constituent pieces, fiddling with the pieces, and reconstructing the pieces into HTML again. The first step is done by sgmllib.py, a part of the standard Python library.
4.3. Extracting data from HTML documents
To extract data from HTML documents, subclass the SGMLParser class and define methods for each tag or entity you want to capture.
4.4. Introducing BaseHTMLProcessor.py
SGMLParser doesn't produce anything by itself. It parses and parses and parses, and it calls a method for each interesting thing it finds, but the methods don't do anything. SGMLParser is an HTML consumer: it takes HTML and breaks it down into small, structured pieces. As you saw in the previous section, you can subclass SGMLParser to define classes that catch specific tags and produce useful things, like a list of all the links on a web page. Now we'll take this one step further by defining a class that catches everything SGMLParser throws at it and reconstructs the complete HTML document. In technical terms, this class will be an HTML producer.
4.5. locals and globals
Python has two built-in functions, locals and globals, which provide dictionary-based access to local and global variables.
4.6. Dictionary-based string formatting
There is an alternative form of string formatting that uses dictionaries instead of tuples of values.
4.7. Quoting attribute values
A common question on comp.lang.python is “I have a bunch of HTML documents with unquoted attribute values, and I want to properly quote them all. How can I do this?”^[10] (This is generally precipitated by a project manager who has found the HTML-is-a-standard religion joining a large project and proclaiming that all pages must validate against an HTML validator. Unquoted attribute values are a common violation of the HTML standard.) Whatever the reason, unquoted attribute values are easy to fix by feeding HTML through BaseHTMLProcessor.
4.8. Introducing dialect.py
Dialectizer is a simple (and silly) descendant of BaseHTMLProcessor. It runs blocks of text through a series of substitutions, but it makes sure that anything within a <pre>...</pre> block passes through unaltered.
4.9. Regular expressions 101
Regular expressions are a powerful (and fairly standardized) way of searching, replacing, and parsing text with complex patterns of characters. If you've used regular expressions in other languages (like Perl), you should skip this section and just read the summary of the re module to get an overview of the available functions and their arguments.
4.10. Putting it all together
It's time to put everything we've learned so far to good use. I hope you were paying attention.
4.11. Summary
Python provides you with a powerful tool, sgmllib.py, to manipulate HTML by turning its structure into an object model. You can use this tool in many different ways.

Chapter 5. XML Processing

5.1. Diving in
There are two basic ways to work with XML. One is called SAX (“Simple API for XML”), and it works by reading the XML a little bit at a time and calling a method for each element it finds. (If you read HTML Processing, this should sound familiar, because that's how the sgmllib module works.) The other is called DOM (“Document Object Model”), and it works by reading in the entire XML document at once and creating an internal representation of it using native Python classes linked in a tree structure. Python has standard modules for both kinds of parsing, but this chapter will only deal with using the DOM.
5.2. Packages
Actually parsing an XML document is very simple: one line of code. However, before we get to that line of code, we need to take a short detour to talk about packages.
5.3. Parsing XML
As I was saying, actually parsing an XML document is very simple: one line of code. Where you go from there is up to you.
5.4. Unicode
Sorry, you've reached the end of the chapter that's been written so far. Please check back at http://diveintopython.org/ for updates.

Chapter 6. Unit Testing

6.1. Diving in
In previous chapters, we “dived in” by immediately looking at code and trying to understanding it as quickly as possible. Now that you have some Python under your belt, we're going to step back and look at the steps that happen before the code gets written.
6.2. Introducing romantest.py
Now that we've completely defined the behavior we expect from our conversion functions, we're going to do something a little unexpected: we're going to write a test suite that puts these functions through their paces and makes sure that they behave the way we want them to. You read that right: we're going to write code that tests code that we haven't written yet.
6.3. Testing for success
The most fundamental part of unit testing is constructing individual test cases. A test case answers a single question about the code it is testing.
6.4. Testing for failure
It is not enough to test that our functions succeed when given good input; we must also test that they fail when given bad input. And not just any sort of failure; they must fail in the way we expect.
6.5. Testing for sanity
Often, you will find that a unit of code contains a set of reciprocal functions, usually in the form of conversion functions where one converts A to B and the other converts B to A. In these cases, it is useful to create a “sanity check” to make sure that you can convert A to B and back to A without losing decimal precision, incurring rounding errors, or triggering any other sort of bug.
6.6. roman.py, stage 1
Now that our unit test is complete, it's time to start writing the code that our test cases are attempting to test. We're going to do this in stages, so we can see all the unit tests fail, then watch them pass one by one as we fill in the gaps in roman.py.
6.7. roman.py, stage 2
Now that we have the framework of our roman module laid out, it's time to start writing code and passing test cases.
6.8. roman.py, stage 3
Now that toRoman behaves correctly with good input (integers from 1 to 3999), it's time to make it behave correctly with bad input (everything else).
6.9. roman.py, stage 4
Now that toRoman is done, it's time to start coding fromRoman. Thanks to our rich data structure that maps individual Roman numerals to integer values, this is no more difficult than the toRoman function.
6.10. roman.py, stage 5
Now that fromRoman works properly with good input, it's time to fit in the last piece of the puzzle: making it work properly with bad input. That means finding a way to look at a string and determine if it's a valid Roman numeral. This is inherently more difficult than validating numeric input in toRoman, but we have a powerful tool at our disposal: regular expressions.
6.11. Handling bugs
Despite your best efforts to write comprehensive unit tests, bugs happen. What do I mean by “bug”? A bug is a test case you haven't written yet.
6.12. Handling changing requirements
Despite your best efforts to pin your customers to the ground and extract exact requirements from them on pain of horrible nasty things involving scissors and hot wax, requirements will change. Most customers don't know what they want until they see it, and even if they do, they aren't that good at articulating what they want precisely enough to be useful. And even if they do, they'll want more in the next release anyway. So be prepared to update your test cases as requirements change.
6.13. Refactoring
The best thing about comprehensive unit testing is not the feeling you get when all your test cases finally pass, or even the feeling you get when someone else blames you for breaking their code and you can actually prove that you didn't. The best thing about unit testing is that it gives you the freedom to refactor mercilessly.
6.14. Postscript
A clever reader read the previous section and took it to the next level. The biggest headache (and performance drain) in the program as it is currently written is the regular expression, which is required because we have no other way of breaking down a Roman numeral. But there's only 5000 of them; why don't we just build a lookup table once, then simply read that? This idea gets even better when you realize that you don't need to use regular expressions at all. As you build the lookup table for converting integers to Roman numerals, you can build the reverse lookup table to convert Roman numerals to integers.
6.15. Summary
Unit testing is a powerful concept which, if properly implemented, can both reduce maintenance costs and increase flexibility in any long-term project. It is also important to understand that unit testing is not a panacea, a Magic Problem Solver, or a silver bullet. Writing good test cases is hard, and keeping them up to date takes discipline (especially when customers are screaming for critical bug fixes). Unit testing is not a replacement for other forms of testing, including functional testing, integration testing, and user acceptance testing. But it is feasible, and it does work, and once you've seen it work, you'll wonder how you ever got along without it.

Chapter 7. Regression Testing

7.1. Diving in
In Unit Testing, we discussed the philosophy of unit testing and stepped through the implementation of it in Python. This chapter will focus more on advanced Python-specific techniques, centered around the unittest module. That means that, unlike the previous chapter, very little of this will be transferrable to other languages. Then again, if you wanted to learn other languages, you wouldn't have read this far, would you?
7.2. Finding the path
Sorry, you've reached the end of the chapter that's been written so far. (Wasn't much, was it? Sorry about that; I wanted to get the code out there and get public feedback on it. Let me know what you think of the example program, and check back at http://diveintopython.org/ for updates.)