3.1.2 字符串 Strings

Besides numbers, Python can also manipulate strings, which can be expressed in several ways. They can be enclosed in single quotes or double quotes:

除了数值, Python 还可以通过几种不同的方法操作字符串。字符串用单引号或双引号标识:

>>> 'spam eggs'
'spam eggs'
>>> 'doesn\'t'
"doesn't"
>>> "doesn't"
"doesn't"
>>> '"Yes," he said.'
'"Yes," he said.'
>>> "\"Yes,\" he said."
'"Yes," he said.'
>>> '"Isn\'t," she said.'
'"Isn\'t," she said.'

String literals can span multiple lines in several ways. Continuation lines can be used, with a backslash as the last character on the line indicating that the next line is a logical continuation of the line:

字符串可以通过几种方式分行。可以在行加反斜杠做为继续符,这表示下一行是当前行的逻辑沿续。

hello = "This is a rather long string containing\n\
several lines of text just as you would do in C.\n\
    Note that whitespace at the beginning of the line is\
 significant."

print hello

Note that newlines would still need to be embedded in the string using \n; the newline following the trailing backslash is discarded. This example would print the following:

注意换行用 \n 来表示;反斜杠后面的新行标识(newline,缩写“n”)会转换为换行符,示例会按如下格式打印:

This is a rather long string containing
several lines of text just as you would do in C.
    Note that whitespace at the beginning of the line is significant.

If we make the string literal a ``raw'' string, however, the \n sequences are not converted to newlines, but the backslash at the end of the line, and the newline character in the source, are both included in the string as data. Thus, the example:

然而,如果我们创建一个“行”("raw")字符串,\ n序列就不会转为换行,示例源码最后的反斜杠和换行符n都会做为字符串中的数据处理。如下所示:

hello = r"This is a rather long string containing\n\
several lines of text much as you would do in C."

print hello

would print:

会打印为:

This is a rather long string containing\n\
several lines of text much as you would do in C.

Or, strings can be surrounded in a pair of matching triple-quotes: """ or '''. End of lines do not need to be escaped when using triple-quotes, but they will be included in the string.

或者,字符串可以用一对三重引号”””或'''来标识。三重引号中的字符串在行尾不需要换行标记,所有的格式都会包括在字符串中。

print """
Usage: thingy [OPTIONS]
     -h                        Display this usage message
     -H hostname               Hostname to connect to
"""

produces the following output: 产生以下结果

Usage: thingy [OPTIONS]
     -h                        Display this usage message
     -H hostname               Hostname to connect to

The interpreter prints the result of string operations in the same way as they are typed for input: inside quotes, and with quotes and other funny characters escaped by backslashes, to show the precise value. The string is enclosed in double quotes if the string contains a single quote and no double quotes, else it's enclosed in single quotes. (The print statement, described later, can be used to write strings without quotes or escapes.)

解释器打印出来的字符串与它们输入的形式完全相同:内部的引号,用反斜杠标识的引号和各种怪字符,都精确的显示出来。如果字符串中包含单引号,不包含双引号,可以用双引号引用它,反之可以用单引号。(后面介绍的 print 语句,可以在不使用引号和反斜杠的情况下输出字符串)。

Strings can be concatenated (glued together) with the + operator, and repeated with *:

字符串可以用 + 号联接(或者说粘合),也可以用 * 号循环。

>>> word = 'Help' + 'A'
>>> word
'HelpA'
>>> '<' + word*5 + '>'
'<HelpAHelpAHelpAHelpAHelpA>'

Two string literals next to each other are automatically concatenated; the first line above could also have been written "word = 'Help' 'A'"; this only works with two literals, not with arbitrary string expressions:

两个字符串值之间的联接是自动的,上例第一行可以写成“word = 'Help' 'A'”。这种方式只对字符串值有效,任何字符串表达式都不适用这种方法。

>>> 'str' 'ing'                   #  <-  This is ok
'string'
>>> 'str'.strip() + 'ing'   #  <-  This is ok
'string'
>>> 'str'.strip() 'ing'     #  <-  This is invalid
  File "<stdin>", line 1, in ?
    'str'.strip() 'ing'
                      ^
SyntaxError: invalid syntax

Strings can be subscripted (indexed); like in C, the first character of a string has subscript (index) 0. There is no separate character type; a character is simply a string of size one. Like in Icon, substrings can be specified with the slice notation: two indices separated by a colon.

字符串可以用下标(索引)查询;就像 C 一样,字符串的第一个字符下标是 0 。这里没有独立的字符类型,字符仅仅是大小为一的字符串。就像在 Icon 中那样,字符串的子串可以通过切片标志来表示:两个由冒号隔开的索引。

>>> word[4]
'A'
>>> word[0:2]
'He'
>>> word[2:4]
'lp'

Slice indices have useful defaults; an omitted first index defaults to zero, an omitted second index defaults to the size of the string being sliced.

切片索引可以使用默认值;省略前一个索引表示 0 ,省略后一个索引表示被切片的字符串的长度。

>>> word[:2]    # The first two characters
'He'
>>> word[2:]    # All but the first two characters
'lpA'

Unlike a C string, Python strings cannot be changed. Assigning to an indexed position in the string results in an error:

和 C 字符串不同, Python 字符串不能改写。按字符串索引赋值会产生错误。

>>> word[0] = 'x'
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: object doesn't support item assignment
>>> word[:1] = 'Splat'
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: object doesn't support slice assignment

However, creating a new string with the combined content is easy and efficient:

然而,可以通过简单有效的组合方式生成新的字符串:

>>> 'x' + word[1:]
'xelpA'
>>> 'Splat' + word[4]
'SplatA'

Here's a useful invariant of slice operations:

切片操作有一个很有用的不变性:

s[:i] + s[i:] equals s.

>>> word[:2] + word[2:]
'HelpA'
>>> word[:3] + word[3:]
'HelpA'

Degenerate slice indices are handled gracefully: an index that is too large is replaced by the string size, an upper bound smaller than the lower bound returns an empty string.

退化的切片索引被处理的很优美:过大的索引代替为字符串大小,下界比上界大的返回空字符串。

>>> word[1:100]
'elpA'
>>> word[10:]
''
>>> word[2:1]
''

Indices may be negative numbers, to start counting from the right. For example:

索引可以是负数,计数从右边开始,例如:

>>> word[-1]     # The last character
'A'
>>> word[-2]     # The last-but-one character
'p'
>>> word[-2:]    # The last two characters
'pA'
>>> word[:-2]    # All but the last two characters
'Hel'

But note that -0 is really the same as 0, so it does not count from the right!

不过-0还是0,它没有从右边计数!

>>> word[-0]     # (since -0 equals 0)
'H'

Out-of-range negative slice indices are truncated, but don't try this for single-element (non-slice) indices:

越界的负切片索引会被截断,不过不要尝试在前元素索引(非切片的)中这样做:

>>> word[-100:]
'HelpA'
>>> word[-10]    # error
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
IndexError: string index out of range

The best way to remember how slices work is to think of the indices as pointing between characters, with the left edge of the first character numbered 0. Then the right edge of the last character of a string of n characters has index n, for example:

理解切片的最好方式是把索引视为两个字符之间的点,第一个字符的左边是0,字符串中第n个字符的右边是索引n,例如:

 +---+---+---+---+---+
 | H | e | l | p | A |
 +---+---+---+---+---+
 0   1   2   3   4   5
-5  -4  -3  -2  -1

The first row of numbers gives the position of the indices 0...5 in the string; the second row gives the corresponding negative indices. The slice from i to j consists of all characters between the edges labeled i and j, respectively.

第一行是字符串中给定的0到5各个索引的位置,第二行是对应的负索引。从i到j的切片由这两个标志之间的字符组成。

For non-negative indices, the length of a slice is the difference of the indices, if both are within bounds. For example, the length of word[1:3] is 2.

对于非负索引,切片长度就是两索引的差。例如,word[1:3]的长度是2。

The built-in function len() returns the length of a string:

内置函数 len() 返回字符串长度:

>>> s = 'supercalifragilisticexpialidocious'
>>> len(s)
34

See Also:

Sequence Types
Strings, and the Unicode strings described in the next section, are examples of sequence types, and support the common operations supported by such types.
String Methods
Both strings and Unicode strings support a large number of methods for basic transformations and searching.
String Formatting Operations
The formatting operations invoked when strings and Unicode strings are the left operand of the % operator are described in more detail here.