2.4.1 串字面值 String literals

String literals are described by the following lexical definitions:

串字面值由以下词法定义描述:

`stringliteral`	::=	`[stringprefix](shortstring \| longstring)`
`stringprefix`	::=	`"r" \| "u" \| "ur" \| "R" \| "U" \| "UR" \| "Ur" \| "uR"`
`shortstring`	::=	`"'" shortstringitem* "'" \| '"' shortstringitem* '"'`
`longstring`	::=	`"'''" longstringitem* "'''"`
		`\| '"""' longstringitem* '"""'`
`shortstringitem`	::=	`shortstringchar \| escapeseq`
`longstringitem`	::=	`longstringchar \| escapeseq`
`shortstringchar`	::=	`<any ASCII character except "\" or newline or the quote>`
`longstringchar`	::=	`<any ASCII character except "\">`
`escapeseq`	::=	`"\" <any ASCII character>`

One syntactic restriction not indicated by these productions is that whitespace is not allowed between the stringprefix and the rest of the string literal.

上面没有表示出来的一个句法限制是在stringprefix和串字面值之间不允许有空白.

In plain English: String literals can be enclosed in matching single quotes (') or double quotes ("). They can also be enclosed in matching groups of three single or double quotes (these are generally referred to as triple-quoted strings). The backslash (\) character is used to escape characters that otherwise have a special meaning, such as newline, backslash itself, or the quote character. String literals may optionally be prefixed with a letter "r" or "R"; such strings are called raw stringsand use different rules for interpreting backslash escape sequences. A prefix of "u" or "U" makes the string a Unicode string. Unicode strings use the Unicode character set as defined by the Unicode Consortium and ISO 10646. Some additional escape sequences, described below, are available in Unicode strings. The two prefix characters may be combined; in this case, "u" must appear before "r".

以英语的方式描述:串是以单引号(')或双引号("), 它们也可以用成对的三个单引号和双引号(这叫做三重引用串), 反斜线\可以用于引用其它有特殊含义的字符, 例如新行, 反斜线本身, 或者引用字符.串字面值可选地可以以'u'和'U'开头, 这样它就是一个"原始串"了, 它在解释反斜线时有着不同的规则, 前缀有'u'和'U'的串是Unicode串, Unicode使用Unicode协会和ISO 10646定义的Unicode字符集. 其它一些在Unicode中有效的转义字符一会儿会提到. 这两个前缀可以组合使用, 但'u'必须在'r'之前.

In triple-quoted strings, unescaped newlines and quotes are allowed (and are retained), except that three unescaped quotes in a row terminate the string. (A ``quote'' is the character used to open the string, i.e. either ' or ".)

在三重引用串中, 未转义的新行和引用字符是允许的(并被保留),除非三个连续的引用字符中断了该串.(引用字符是用于引用字符串的字符, 如'和")

Unless an "r" or "R" prefix is present, escape sequences in strings are interpreted according to rules similar to those used by Standard C. The recognized escape sequences are:

如果一个'r'或'R'给出, 那么其含义就像标准C中的规则类似地解释, 承认的转义的字符如下:

Escape Sequence Meaning Notes

\newline Ignored
\\ Backslash (\)
\' Single quote (')
\" Double quote (")
\a ASCII Bell (BEL)
\b ASCII Backspace (BS)
\f ASCII Formfeed (FF)
\n ASCII Linefeed (LF)
\N{name} Character named name in the Unicode database (Unicode only)
\r ASCII Carriage Return (CR)
\t ASCII Horizontal Tab (TAB)
\uxxxx Character with 16-bit hex value xxxx (Unicode only) (1)
\Uxxxxxxxx Character with 32-bit hex value xxxxxxxx (Unicode only) (2)
\v ASCII Vertical Tab (VT)
\ooo ASCII character with octal value ooo (3)
\xhh ASCII character with hex value hh (4)

Escape Sequence	Meaning	Notes
`\newline`	Ignored
`\\`	Backslash (`\`)
`\'`	Single quote (`'`)
`\"`	Double quote (`"`)
`\a`	ASCII Bell (BEL)
`\b`	ASCII Backspace (BS)
`\f`	ASCII Formfeed (FF)
`\n`	ASCII Linefeed (LF)
`\N{name}`	Character named `name` in the Unicode database (Unicode only)
`\r`	ASCII Carriage Return (CR)
`\t`	ASCII Horizontal Tab (TAB)
`\uxxxx`	Character with 16-bit hex value `xxxx` (Unicode only)	(1)
`\Uxxxxxxxx`	Character with 32-bit hex value `xxxxxxxx` (Unicode only)	(2)
`\v`	ASCII Vertical Tab (VT)
`\ooo`	ASCII character with octal value `ooo`	(3)
`\xhh`	ASCII character with hex value `hh`	(4)

Notes:

(1): Individual code units which form parts of a surrogate pair can be encoded using this escape sequence.
(2): Any Unicode character can be encoded this way, but characters outside the Basic Multilingual Plane (BMP) will be encoded using a surrogate pair if Python is compiled to use 16-bit code units (the default). Individual code units which form parts of a surrogate pair can be encoded using this escape sequence.
(3): As in Standard C, up to three octal digits are accepted.
(4): Unlike in Standard C, at most two hex digits are accepted.

Unlike Standard , all unrecognized escape sequences are left in the string unchanged, i.e., the backslash is left in the string. (This behavior is useful when debugging: if an escape sequence is mistyped, the resulting output is more easily recognized as broken.) It is also important to note that the escape sequences marked as ``(Unicode only)'' in the table above fall into the category of unrecognized escapes for non-Unicode string literals.

不像标准C, 所有不能被解释的转义序列留在串不作改变, 即反斜线留在串中(这个行为在调试中有用: 如果输入出错, 这样可以很容易地判断出错), 也要注意, 上面仅仅在Unicode中才有效的转义序列,在非Unicode字面值中是无效的.

When an "r" or "R" prefix is present, a character following a backslash is included in the string without change, and all backslashes are left in the string. For example, the string literal r"\n" consists of two characters: a backslash and a lowercase "n". String quotes can be escaped with a backslash, but the backslash remains in the string; for example, r"\"" is a valid string literal consisting of two characters: a backslash and a double quote; r"\" is not a valid string literal (even a raw string cannot end in an odd number of backslashes). Specifically, a raw string cannot end in a single backslash (since the backslash would escape the following quote character). Note also that a single backslash followed by a newline is interpreted as those two characters as part of the string, not as a line continuation.

当给出'r'或'R'时, 跟随反斜线后面的字符不被改变, 并且所有制的反斜线字符都会留在串中.例如,串r"\n"由两个字符组成:一个反斜线的一个小写的'n'.引用字符可以用反斜线引用, 但反斜线会留在串中.比如r"\""是一个有效的串字面值(即使原始串不能以连续的奇数个反斜线结束). 另外, 原始不能以一个反斜线结束(因为反斜线会把后面的引用字符转义), 也要注意新行号前的反斜线是解释为串中的两个字符, 而不是作为续行处理.

When an "r" or "R" prefix is used in conjunction with a "u" or "U" prefix, then the \uXXXX escape sequence is processed while all other backslashes are left in the string. For example, the string literal ur"\u0062\n" consists of three Unicode characters: `LATIN SMALL LETTER B', `REVERSE SOLIDUS', and `LATIN SMALL LETTER N'. Backslashes can be escaped with a preceding backslash; however, both remain in the string. As a result, \uXXXX escape sequences are only recognized when there are an odd number of backslashes.