1.1 记法 Notation

The descriptions of lexical analysis and syntax use a modified BNF grammar notation. This uses the following style of definition:

在描述词法和句法分析时候, 我们使用不甚严格的BNF, 通常是以下的定义方式:

name:           lc_letter (lc_letter | "_")*
lc_letter:      "a"..."z"

The first line says that a name is an lc_letter followed by a sequence of zero or more lc_letters and underscores. An lc_letter in turn is any of the single characters "a" through "z". (This rule is actually adhered to for the names defined in lexical and grammar rules in this document.)

第一行说明name为lc_letter后跟随零个以上(包括零个)lc_letter或下划线的序列. lc_letter是"a"至"z"中任意一个字符.(实际上, 这个"名字"的定义贯穿于本文档的整个词法和语法规则中)

Each rule begins with a name (which is the name defined by the rule) and a colon. A vertical bar (|) is used to separate alternatives; it is the least binding operator in this notation. A star (*) means zero or more repetitions of the preceding item; likewise, a plus (+) means one or more repetitions, and a phrase enclosed in square brackets ([ ]) means zero or one occurrences (in other words, the enclosed phrase is optional). The * and + operators bind as tightly as possible; parentheses are used for grouping. Literal strings are enclosed in quotes. White space is only meaningful to separate tokens. Rules are normally contained on a single line; rules with many alternatives may be formatted alternatively with each line after the first beginning with a vertical bar.

每个规则以一个名字(为所定义的规则的名字)和一个冒号为开始. 竖线(|)用于分隔可选项.这是记法中结合性最弱的符号.星号(*)意味着前一项的零次或多次的重复; 同样, 加号(+)意味着一次或多次的重复. 在方括号([])中的内容意味着它可以出现零次或一次(也就是说它是可选的).星号和加号与前面的项尽可能地紧密的结合, 小括号用于分组.字符串的字面值用引号括住.空白字符仅仅在分隔语言符号(token)时有用.通常规则被包含在一行之中, 有很多可选项的规则可能会被格式化成多行的形式, 后续行都以一个竖线开始.

In lexical definitions (as the example above), two more conventions are used: Two literal characters separated by three dots mean a choice of any single character in the given (inclusive) range of ASCII characters. A phrase between angular brackets (<...>) gives an informal description of the symbol defined; e.g., this could be used to describe the notion of `control character' if needed.

在词法定义中(如上例), 有两个习惯比较常用: 以三个句点分隔的一对串字面值意味着在给定(包括)的ASCII字符范围内任选一个字符。在尖括号(<>)中的短语给出了非正式的说明, 例如, 这用在了需要说明"控制字符"记法的时候.

Even though the notation used is almost the same, there is a big difference between the meaning of lexical and syntactic definitions: a lexical definition operates on the individual characters of the input source, while a syntax definition operates on the stream of tokens generated by the lexical analysis. All uses of BNF in the next chapter (``Lexical Analysis'') are lexical definitions; uses in subsequent chapters are syntactic definitions.

即使在句法和词法定义中使用的记号几乎相同, 但它们之间在含义上还是有着的很大不同: 词法定义是在输入源的一个个字符上进行操作,而句法定义是在由词法分析所生成的语言符号流上进行操作。在下节("词法分析")中使用的BNF都是词法定义, 以后的章节是句法定义.