Appendix C. Tips and tricks
Chapter 1. Getting To Know Python
- 1.1. Diving in
|
In the Python IDE on Windows, you can run a module with
File->Run... (Ctrl-R). Output is displayed in the interactive window. |
|
In the Python IDE on Mac OS, you can run a module with
Python->Run window... (Cmd-R), but there is an important option you must set first. Open the module in the IDE, pop up the module's options menu by clicking the black triangle in the upper-right corner of the window, and make sure “Run as __main__” is checked. This setting is saved with the module, so you only have to do this once per module. |
|
On UNIX-compatible systems (including Mac OS X), you can run a module from the command line: python odbchelper.py |
- 1.2. Declaring functions
|
In Visual Basic, functions (that return a value) start with function, and subroutines (that do not return a value) start with sub. There are no subroutines in Python. Everything is a function, all functions return a value (even if it's None), and all functions start with def. |
|
In Java, C++, and other statically-typed languages, you must specify the datatype of the function return value and each function argument. In Python, you never explicitly specify the datatype of anything. Based on what value you assign, Python keeps track of the datatype internally. |
- 1.3. Documenting functions
|
Triple quotes are also an easy way to define a string with both single and double quotes, like qq/.../ in Perl. |
|
Many Python IDEs use the doc string to provide context-sensitive documentation, so that when you type a function name, its doc string appears as a tooltip. This can be incredibly helpful, but it's only as good as the doc strings you write. |
- 1.4. Everything is an object
|
import in Python is like require in Perl. Once you import a Python module, you access its functions with module.function; once you require a Perl module, you access its functions with module::function. |
- 1.5. Indenting code
|
Python uses carriage returns to separate statements and a colon and indentation to separate code blocks. C++ and Java use semicolons to separate statements and curly braces to separate code blocks. |
- 1.6. Testing modules
|
Like C, Python uses == for comparison and = for assignment. Unlike C, Python does not support in-line assignment, so there's no chance of accidentally assigning the value you thought you were comparing. |
|
On MacPython, there is an additional step to make the if __name__ trick work. Pop up the module's options menu by clicking the black triangle in the upper-right corner of the window, and make sure Run as __main__ is checked. |
- 1.7. Dictionaries 101
|
A dictionary in Python is like a hash in Perl. In Perl, variables which store hashes always start with a % character; in Python, variables can be named anything, and Python keeps track of the datatype internally. |
|
A dictionary in Python is like an instance of the Hashtable class in Java. |
|
A dictionary in Python is like an instance of the Scripting.Dictionary object in Visual Basic. |
|
Dictionaries have no concept of order among elements. It is incorrect to say that the elements are “out of order”; they are simply unordered. This is an important distinction which will annoy you when you want to access the elements of a dictionary in a specific, repeatable order (like alphabetical order by key). There are ways of doing this, they're just not built into the dictionary. |
- 1.8. Lists 101
|
A list in Python is like an array in Perl. In Perl, variables which store arrays always start with the @ character; in Python, variables can be named anything, and Python keeps track of the datatype internally. |
|
A list in Python is much more than an array in Java (although it can be used as one if that's really all you want out of life). A better analogy would be to the Vector class, which can hold arbitrary objects and can expand dynamically as new items are added. |
|
There is no boolean datatype in Python. In a boolean context (like an if statement), 0 is false and all other numbers are true. This extends to other datatypes, too. An empty string (""), an empty list ([]), and an empty dictionary ({}) are all false; all other strings, lists, and dictionaries are true. |
- 1.9. Tuples 101
|
Tuples can be converted into lists, and vice-versa. The built-in tuple function takes a list and returns a tuple with the same elements, and the list function takes a tuple and returns a list. In effect, tuple freezes a list, and list thaws a tuple. |
- 1.10. Defining variables
|
When a command is split among several lines with the line continuation marker (“\”), the continued lines can be indented in any manner; Python's normally stringent indentation rules do not apply. If your Python IDE auto-indents the continued line, you should probably accept its default unless you have a burning reason not to. |
|
Strictly speaking, expressions in parentheses, straight brackets, or curly braces (like defining a dictionary) can be split into multiple lines with or without the line continuation character (“\”). I like to include the backslash even when it's not required because I think it makes the code easier to read, but that's a matter of style. |
- 1.12. Formatting strings
|
String formatting in Python uses the same syntax as the sprintf function in C. |
- 1.14. Joining lists and splitting strings
|
join only works on lists of strings; it does not do any type coercion. joining a list that has one or more non-string elements will raise an exception. |
|
anystring.split(delimiter, 1) is a useful technique when you want to search a string for a substring and then work with everything before the substring (which ends up in the first element of the returned list) and everything after it (which ends up in the second element). |
Chapter 2. The Power Of Introspection
- 2.2. Optional and named arguments
|
The only thing you have to do to call a function is specify a value (somehow) for each required argument; the manner and order in which you do that is up to you. |
- 2.3. type, str, dir, and other built-in functions
|
Python comes with excellent reference manuals, which you should peruse thoroughly to learn all the modules Python has to offer. But whereas in most languages you would find yourself referring back to the manuals (or man pages, or, God help you, MSDN) to remind yourself how to use these modules, Python is largely self-documenting. |
- 2.6. The peculiar nature of and and or
|
The and-or trick, bool and a or b, will not work like the C expression bool ? a : b when a is false in a boolean context. |
- 2.7. Using lambda functions
|
lambda functions are a matter of style. Using them is never required; anywhere you could use them, you could define a separate normal function and use that instead. I use them in places where I want to encapsulate specific, non-reusable code without littering my code with a lot of little one-line functions. |
- 2.8. Putting it all together
|
In SQL, you must use IS NULL instead of = NULL to compare a null value. In Python, you can use either == None or is None, but is None is faster. |
Chapter 3. An Object-Oriented Framework
- 3.2. Importing modules using from module import
|
from module import in Python is like use module in Perl; import module in Python is like require module in Perl. |
|
from module import in Python is like import module.* in Java; import module in Python is like import module in Java. |
- 3.3. Defining classes
|
The pass statement in Python is like an empty set of braces ({}) in Java or C. |
|
In Python, the ancestor of a class is simply listed in parentheses immediately after the class name. There is no special keyword like extends in Java. |
|
Although I won't discuss it in depth in this book, Python supports multiple inheritance. In the parentheses following the class name, you can list as many ancestor classes as you like, separated by commas. |
|
By convention, the first argument of any class method (the reference to the current instance) is called self. This argument fills the role of the reserved word this in C++ or Java, but self is not a reserved word in Python, merely a naming convention. Nonetheless, please don't call it anything but self; this is a very strong convention. |
|
When defining your class methods, you must explicitly list self as the first argument for each method, including __init__. When you call a method of an ancestor class from within your class, you must include the self argument. But when you call your class method from outside, you do not specify anything for the self argument; you skip it entirely, and Python automatically adds the instance reference for you. I am aware that this is confusing at first; it's not really inconsistent, but it may appear inconsistent because it relies on a distinction (between bound and unbound methods) that you don't know about yet. |
|
__init__ methods are optional, but when you define one, you must remember to explicitly call the ancestor's __init__ method. This is more generally true: whenever a descendant wants to extend the behavior of the ancestor, the descendant method must explicitly call the ancestor method at the proper time, with the proper arguments. |
- 3.4. Instantiating classes
|
In Python, simply call a class as if it were a function to create a new instance of the class. There is no explicit new operator like C++ or Java. |
- 3.5. UserDict: a wrapper class
|
In the Python IDE on Windows, you can quickly open any module in your library path with
File->Locate... (Ctrl-L). |
|
Java and Powerbuilder support function overloading by argument list, i.e. one class can have multiple methods with the same name but a different number of arguments, or arguments of different types. Other languages (most notably PL/SQL) even support function overloading by argument name; i.e. one class can have multiple methods with the same name and the same number of arguments of the same type but different argument names. Python supports neither of these; it has no form of function overloading whatsoever. An __init__ method is an __init__ method is an __init__ method, regardless of its arguments. There can be only one __init__ method per class, and if a descendant class has an __init__ method, it always overrides the ancestor __init__ method, even if the descendant defines it with a different argument list. |
|
Always assign an initial value to all of an instance's data attributes in the __init__ method. It will save you hours of debugging later. |
- 3.6. Special class methods
|
When accessing data attributes within a class, you need to qualify the attribute name: self.attribute. When calling other methods within a class, you need to qualify the method name: self.method. |
- 3.7. Advanced special class methods
|
In Java, you determine whether two string variables reference the same physical memory location by using str1 == str2. This is called object identity, and it is written in Python as str1 is str2. To compare string values in Java, you would use str1.equals(str2); in Python, you would use str1 == str2. Java programmers who have been taught to believe that the world is a better place because == in Java compares by identity instead of by value may have a difficult time adjusting to Python's lack of such “gotchas”. |
|
While other object-oriented languages only let you define the physical model of an object (“this object has a GetLength method”), Python's special class methods like __len__ allow you to define the logical model of an object (“this object has a length”). |
- 3.8. Class attributes
|
In Java, both static variables (called class attributes in Python) and instance variables (called data attributes in Python) are defined immediately after the class definition (one with the static keyword, one without). In Python, only class attributes can be defined here; data attributes are defined in the __init__ method. |
- 3.9. Private functions
|
If the name of a Python function, class method, or attribute starts with (but doesn't end with) two underscores, it's private; everything else is public. |
|
In Python, all special methods (like __setitem__) and built-in attributes (like __doc__) follow a standard naming convention: they both start with and end with two underscores. Don't name your own methods and attributes this way; it will only confuse you (and others) later. |
|
Python has no concept of protected class methods (accessible only in their own class and descendant classes). Class methods are either private (accessible only in their own class) or public (accessible from anywhere). |
- 3.10. Handling exceptions
|
Python uses try...except to handle exceptions and raise to generate them. Java and C++ use try...catch to handle exceptions, and throw to generate them. |
- 3.14. The os module
|
Whenever possible, you should use the functions in os and os.path for file, directory, and path manipulations. These modules are wrappers for platform-specific modules, so functions like os.path.split work on UNIX, Windows, Mac OS, and any other supported Python platform. |
Chapter 4. HTML Processing
- 4.2. Introducing sgmllib.py
|
Python 2.0 had a bug where SGMLParser would not recognize declarations at all (handle_decl would never be called), which meant that DOCTYPEs were silently ignored. This is fixed in Python 2.1. |
|
In the Python IDE on Windows, you can specify command line arguments in the “Run script” dialog. Separate multiple arguments with spaces. |
- 4.4. Introducing BaseHTMLProcessor.py
|
The HTML specification requires that all non-HTML (like client-side JavaScript) must be enclosed in HTML comments, but not all web pages do this properly (and all modern web browsers are forgiving if they don't). BaseHTMLProcessor is not forgiving; if script is improperly embedded, it will be parsed as if it were HTML. For instance, if the script contains less-than and equals signs, SGMLParser may incorrectly think that it has found tags and attributes. SGMLParser always converts tags and attribute names to lowercase, which may break the script, and BaseHTMLProcessor always encloses attribute values in double quotes (even if the original HTML document used single quotes or no quotes), which will certainly break the script. Always protect your client-side script within HTML comments. |
- 4.5. locals and globals
|
Python 2.2 will introduce a subtle but important change that affects the namespace search order: nested scopes. In Python 2.0, when you reference a variable within a nested function or lambda function, Python will search for that variable in the current (nested or lambda) function's namespace, then in the module's namespace. Python 2.2 will search for the variable in the current (nested or lambda) function's namespace, then in the parent function's namespace, then in the module's namespace. Python 2.1 can work either way; by default, it works like Python 2.0, but you can add the following line of code at the top of your module to make your module work like Python 2.2:
from __future__ import nested_scopes |
|
Using the locals and globals functions, you can get the value of arbitrary variables dynamically, providing the variable name as a string. This mirrors the functionality of the getattr function, which allows you to access arbitrary functions dynamically by providing the function name as a string. |
- 4.6. Dictionary-based string formatting
|
Using dictionary-based string formatting with locals is a convenient way of making complex string formatting expressions more readable, but it comes with a price. There is a slight performance hit in making the call to locals, since locals builds a copy of the local namespace. |
Chapter 5. XML Processing
- 5.2. Packages
|
A package is a directory with the special __init__.py file in it. The __init__.py file defines the attributes and methods of the package. It doesn't have to define anything; it can just be an empty file, but it has to exist. But if __init__.py doesn't exist, the directory is just a directory, not a package, and it can't be imported or contain modules or nested packages. |
Chapter 6. Unit Testing
- 6.2. Introducing romantest.py
- 6.8. roman.py, stage 3
|
The most important thing that comprehensive unit testing can tell you is when to stop coding. When all the unit tests for a function pass, stop coding the function. When all the unit tests for an entire module pass, stop coding the module. |
- 6.10. roman.py, stage 5
|
When all your tests pass, stop coding. |
- 6.13. Refactoring
|
Whenever you are going to use a regular expression more than once, you should compile it to get a pattern object, then call the methods on the pattern object directly. |
Chapter 7. Regression Testing