String literals are described by the following lexical definitions:
串字面值由以下词法定义描述:
stringliteral |
::= | [stringprefix](shortstring | longstring) |
stringprefix |
::= | "r" | "u" | "ur" | "R" | "U" | "UR" | "Ur" | "uR" |
shortstring |
::= | "'" shortstringitem* "'"
| '"' shortstringitem* '"' |
longstring |
::= | "'''" longstringitem* "'''" |
| '"""' longstringitem* '"""' | ||
shortstringitem |
::= | shortstringchar | escapeseq |
longstringitem |
::= | longstringchar | escapeseq |
shortstringchar |
::= | <any ASCII character except "\" or newline or the quote> |
longstringchar |
::= | <any ASCII character except "\"> |
escapeseq |
::= | "\" <any ASCII character> |
One syntactic restriction not indicated by these productions is that whitespace is not allowed between the stringprefix and the rest of the string literal.
上面没有表示出来的一个句法限制是在stringprefix和串字面值之间不允许有空白.
In plain English: String literals can be enclosed in matching single
quotes ('
) or double quotes ("
). They can also be
enclosed in matching groups of three single or double quotes (these
are generally referred to as triple-quoted strings). The
backslash (\
) character is used to escape characters that
otherwise have a special meaning, such as newline, backslash itself,
or the quote character. String literals may optionally be prefixed
with a letter "r" or "R"; such strings are called
raw stringsand use different rules for interpreting
backslash escape sequences. A prefix of "u" or "U"
makes the string a Unicode string. Unicode strings use the Unicode character
set as defined by the Unicode Consortium and ISO 10646. Some additional
escape sequences, described below, are available in Unicode strings.
The two prefix characters may be combined; in this case, "u" must
appear before "r".
以英语的方式描述:串是以单引号(')或双引号("), 它们也可以用成对的三个单引号和双引号(这叫做三重引用串), 反斜线\
可以用于引用其它有特殊含义的字符, 例如新行, 反斜线本身, 或者引用字符.串字面值可选地可以以'u'和'U'开头, 这样它就是一个"原始串"了, 它在解释反斜线时有着不同的规则, 前缀有'u'和'U'的串是Unicode串, Unicode使用Unicode协会和ISO 10646定义的Unicode字符集. 其它一些在Unicode中有效的转义字符一会儿会提到. 这两个前缀可以组合使用, 但'u'必须在'r'之前.
In triple-quoted strings,
unescaped newlines and quotes are allowed (and are retained), except
that three unescaped quotes in a row terminate the string. (A
``quote'' is the character used to open the string, i.e. either
'
or "
.)
在三重引用串中, 未转义的新行和引用字符是允许的(并被保留),除非三个连续的引用字符中断了该串.(引用字符是用于引用字符串的字符, 如'和")
Unless an "r" or "R" prefix is present, escape sequences in strings are interpreted according to rules similar to those used by Standard C. The recognized escape sequences are:
如果一个'r'或'R'给出, 那么其含义就像标准C中的规则类似地解释, 承认的转义的字符如下:
Escape Sequence | Meaning | Notes |
---|---|---|
\newline |
Ignored | |
\\ |
Backslash (\ ) |
|
\' |
Single quote (' ) |
|
\" |
Double quote (" ) |
|
\a |
ASCII Bell (BEL) | |
\b |
ASCII Backspace (BS) | |
\f |
ASCII Formfeed (FF) | |
\n |
ASCII Linefeed (LF) | |
\N{name} |
Character named name in the Unicode database (Unicode only) | |
\r |
ASCII Carriage Return (CR) | |
\t |
ASCII Horizontal Tab (TAB) | |
\uxxxx |
Character with 16-bit hex value xxxx (Unicode only) | (1) |
\Uxxxxxxxx |
Character with 32-bit hex value xxxxxxxx (Unicode only) | (2) |
\v |
ASCII Vertical Tab (VT) | |
\ooo |
ASCII character with octal value ooo | (3) |
\xhh |
ASCII character with hex value hh | (4) |
Notes:
Unlike Standard , all unrecognized escape sequences are left in the string unchanged, i.e., the backslash is left in the string. (This behavior is useful when debugging: if an escape sequence is mistyped, the resulting output is more easily recognized as broken.) It is also important to note that the escape sequences marked as ``(Unicode only)'' in the table above fall into the category of unrecognized escapes for non-Unicode string literals.
不像标准C, 所有不能被解释的转义序列留在串不作改变, 即反斜线留在串中(这个行为在调试中有用: 如果输入出错, 这样可以很容易地判断出错), 也要注意, 上面仅仅在Unicode中才有效的转义序列,在非Unicode字面值中是无效的.
When an "r" or "R" prefix is present, a character
following a backslash is included in the string without change, and all
backslashes are left in the string. For example, the string literal
r"\n"
consists of two characters: a backslash and a lowercase
"n". String quotes can be escaped with a backslash, but the
backslash remains in the string; for example, r"\""
is a valid string
literal consisting of two characters: a backslash and a double quote;
r"\"
is not a valid string literal (even a raw string cannot
end in an odd number of backslashes). Specifically, a raw
string cannot end in a single backslash (since the backslash would
escape the following quote character). Note also that a single
backslash followed by a newline is interpreted as those two characters
as part of the string, not as a line continuation.
当给出'r'或'R'时, 跟随反斜线后面的字符不被改变, 并且所有制的反斜线字符都会留在串中.例如,串r"\n"
由两个字符组成:一个反斜线的一个小写的'n'.引用字符可以用反斜线引用, 但反斜线会留在串中.比如r"\""
是一个有效的串字面值(即使原始串不能以连续的奇数个反斜线结束). 另外, 原始不能以一个反斜线结束(因为反斜线会把后面的引用字符转义), 也要注意新行号前的反斜线是解释为串中的两个字符, 而不是作为续行处理.
When an "r" or "R" prefix is used in conjunction
with a "u" or "U" prefix, then the \uXXXX
escape sequence is processed while all other backslashes are
left in the string. For example, the string literal
ur"\u0062\n"
consists of three Unicode characters: `LATIN
SMALL LETTER B', `REVERSE SOLIDUS', and `LATIN SMALL LETTER N'.
Backslashes can be escaped with a preceding backslash; however, both
remain in the string. As a result, \uXXXX
escape sequences
are only recognized when there are an odd number of backslashes.