current position:Home>Script language compilation front-end principle notes

Script language compilation front-end principle notes

2022-09-23 07:55:35Book Recalls Jiangnan

一、编译器的前端技术

1. 编译原理中的“前端(Front End)”Refers to the compiler on the analysis of the program code and understand the process.It's usually just about the syntax of the language,跟目标机器无关.而与之对应的“后端(Back End)”is the process of generating object code,Related to the target machine.编译过程如下所示:

可以看到,编译器的“前端”Technology is divided into morphology analysis、语法分析和语义分析三个部分.通常,The compiler's first job is called a lexical analysis(Lexical Analysis).Like to read the article,Articles are made up of words,Process as well,It's just that it's not called a word here, it's called“词法记号”即Token.举个例子,The following code if you want to read it,what to do first:

#include <stdio.h>
int main(int argc, char* argv[]){
    int age = 45;
    if (age >= 17+8+20) {
        printf("Hello old man!\\n");
    }
    else{
        printf("Hello young man!\\n");
    }
    return 0;
}

This will identifyif、else、int这样的关键字,main、printf、age这样的标识符,+、-、=这样的操作符号,还有花括号、圆括号、semicolon and other symbols,以及数字字面量、字符串字面量等,这些都是Token.So how to write a program to recognizeToken呢?可以看到,Spaces and punctuation are often used to separate words in English content,方便读者阅读和理解.But in a computer program,Just splitting by spaces and punctuation won't work,比如“age >= 45”应该分成“age”、“>=”和“45”这三个Token,But in code they can be linked together,Don't have to have a space.

其实,Can be set some rules to distinguish between each of the differentToken,例如:

(1)识别age这样的标识符.它以字母开头,后面可以是字母或数字,直到遇到第一个既不是字母又不是数字的字符时结束.

(2)识别>=这样的操作符.当扫描到一个>Characters will pay attention to the,它可能是一个GT(Greater Than,大于)操作符.但由于GE(Greater Equal,大于等于)也是以>开头的,所以再往下再看一位,如果是=,那么这个Token就是GE,否则就是GT.

(3)识别45这样的数字字面量.When scanning a numeric character,就开始把它看做数字,直到遇到非数字的字符.

These rules can be generated by the lexical analyzer tool to generate,比如Lex.这些规则用“正则文法(Regular Grammar)”表达,符合正则文法的表达式称为“正则表达式”.生成工具可以读入正则表达式,生成一种叫“有限自动机(Finite-state Automaton,FSA,or Finite Automaton)”的算法,来完成具体的词法分析工作.

有限自动机是有限个状态的自动机器,For example, the lexical analyzer analyzes the string of the entire program,当遇到不同的字符时,会驱使它迁移到不同的状态,For example, the lexical analyzer is scanningage时,处于“标识符”状态,等它遇到一个>the symbol switches to“比较操作符”的状态.词法分析过程,就是这样一个个状态迁移的过程,如下图所示:

2. The compiler is the next phase of the work语法分析.词法分析是识别一个个的单词,而语法分析就是在词法分析的基础上识别出程序的语法结构,This structure is a tree structure,is easy for computers to understand and execute.以自然语言为例.Natural languages ​​have well-defined grammatical structures,比如,“I like you who are smart and brave”This sentence contains a“主、谓、宾”三个部分.主语是“我”,谓语是“喜欢”,The object part is“Clever and brave of you”,The object part can be split into two parts,“smart and brave”is the attributive part,用来修饰“你”.Attributive section can be divided into again“聪明”和“勇敢”The smallest two unit.

这样拆下来,Will construct a tree,Each subtree inside has a certain structure,而这个结构要符合语法.比如,Chinese is used“主谓宾”的结构,Japanese is used“subject predicate”的结构,如下图所示:

Programs also have well-defined syntactic structures,Its syntax analysis process is to construct such a tree,这棵树叫做抽象语法树(Abstract Syntax Tree,AST).树的每个节点(子树)是一个语法单元,这个单元的构成规则就叫“语法”.每个节点还可以有下级节点.Then visually see what does the tree look like,在MacComputers use the following command:

clang -cc1 -ast-dump hello.c

-ast-dump parameter to make it output AST,Rather than do routine compiled,Part of the output is shown below:

If you enter an expression that can be evaluated,例如“2+3*5”,Will get a similar chartAST:

形成ASTThe later benefit is that the computer is easy to handle.For example, this tree formed for expressions,从根节点遍历整棵树就可以获得表达式的值.If the loop statement again、判断语句、Assignment statements such as node to theAST上,And explain its execution,then actually implements a scripting language,而The process of execution scripting language,就是遍历AST的过程.

So how to write a program structureAST呢?The construction of an intuitive thinking is a top-down analysis.First construct the root node to represent the entire program,之后向下扫描Token串,构建它的子节点.当它看到一个int类型的Token时,知道这儿遇到了一个变量声明语句,于是建立一个“变量声明”节点;接着遇到age,建立一个子节点,这是第一个变量;之后遇到=,意味着这个变量有初始化值,那么建立一个初始化的子节点;最后,遇到“字面量”,其值是45.这样,一棵子树就扫描完毕了.程序退回到根节点,开始构建根节点的第二个子节点.这样递归地扫描,直到构建起一棵完整的树,如下图所示:

This algorithm is commonly used递归下降算法(Recursive Descent Parsing).递归下降算法是一种自顶向下的算法,与之对应的,还有自底向上的算法,这个算法会先将最下面的叶子节点识别出来,然后再组装上一级节点.

3. The above finished lexical analysis、语法分析,The compiler do next job is to语义分析.说白了,Semantic analysis is to make the real intention of computer understand program,remove some ambiguities.Semantic analysis is less complex than natural language processing,因为计算机语言的语义一般可以表达为一些规则,只要检查是否符合这些规则就行了.比如:

(1)某个表达式的计算结果是什么数据类型?如果有数据类型不匹配的情况,是否要做自动转换?

(2)如果在一个代码块的内部和外部有相同名称的变量,At the time of execution exactly what?就像“I like you who are smart and brave”中的“你”,到底指的是谁,需要明确.

(3)在同一个作用域内,不允许有两个名称相同的变量,这是唯一性检查.Can't just declare a variablea,紧接着又声明同样名称的一个变量a.

语义分析工作的某些成果,会作为属性标注在抽象语法树上,比如在age这个标识符节点和45这个字面量节点上,都会标识它的数据类型是int型的.在这个树上还可以标记很多属性,有些属性是在之前的两个阶段就被标注上了,比如所处的源代码行号,这一行的第几个字符,When the compiler error so,就可以比较清楚地了解出错的位置.做了这些属性标注以后,编译器在后面就可以依据这些信息生成目标代码了,This is mentioned in the backend section of compilation techniques.

因此总结一下,词法分析是把程序分割成一个个Token的过程,可以通过构造有限自动机来实现.语法分析是把程序的结构识别出来,并形成一棵便于由计算机处理的抽象语法树,可以用递归下降的算法来实现.语义分析是消除语义模糊,生成一些属性信息,让计算机能够依据这些信息生成目标代码.

二、正则文法和有限自动机

4. 上面提到Lexical analysis work is a long string to identify the words one by one,这一个个单词就是Token.而且词法分析的工作是一边读取一边识别字符串的,不是把字符串都读到内存再识别.Then a string is a series of characters formed,怎么把它断开成一个个的Token呢?分割的依据是什么呢?In fact, the process of manually building a lexical analyzer,就是写出正则表达式,画出有限自动机的图形,然后根据图形直观地写出解析代码的过程.To describe the identifier before、比较操作符和数字字面量这三种Token的词法规则:

(1)标识符:第一个字符必须是字母,后面的字符可以是字母或数字.

(2)比较操作符:如>和>=.

(3)数字字面量:全部由数字构成.

Is on the basis of such a rule to construct the finite automaton.Lexical analysis program to meetage、>=和45时,会分别识别成标识符、比较操作符和数字字面量.The finite automaton looks like this:

one at a timetokenWill return to original state after1.The circle in the picture above has a single line and a double line.双线的意思是这个状态已经是一个合法的 Token 了,单线的意思是这个状态还是临时状态.按照这5种状态迁移过程,It is easy to program,start with the status1开始,在遇到不同的字符时,分别进入2、3、5三个状态:

DfaState newState = DfaState.Initial;
if (isAlpha(ch)) {              //第一个字符是字母
    newState = DfaState.Id; //进入Id状态
    token.type = TokenType.Identifier;
    tokenText.append(ch);
} else if (isDigit(ch)) {       //第一个字符是数字
    newState = DfaState.IntLiteral;
    token.type = TokenType.IntLiteral;
    tokenText.append(ch);
} else if (ch == '>') {         //第一个字符是>
    newState = DfaState.GT;
    token.type = TokenType.GT;
    tokenText.append(ch);
}

上面的代码中,用Java中的枚举(enum)Type defines some enumeration values to represent different status among themToken是自定义的一个数据结构,它有两个主要的属性:一个是“type”,就是Token的类型,它用的也是一个枚举类型的值;一个是“text”,也就是这个Token的文本值.Then the processing enters2、3、5三个状态之后的状态迁移过程:

case Initial:
    state = initToken(ch);          //To determine the subsequent states
    break;
case Id:
    if (isAlpha(ch) || isDigit(ch)) {
        tokenText.append(ch);       //Keep the identifier state
    } else {
        state = initToken(ch); //Exit the identifier state,并保存Token
    }
    break;
case GT:
    if (ch == '=') {
        token.type = TokenType.GE;  //转换成GE
        state = DfaState.GE;
        tokenText.append(ch);
    } else {
        state = initToken(ch);      //退出GT状态,并保存Token
    }
    break;
case GE:
    state = initToken(ch);        //退出当前状态,并保存Token
    break;
case IntLiteral:
    if (isDigit(ch)) {
        tokenText.append(ch);    //remain in numeric literal state
    } else {
        state = initToken(ch);    //退出当前状态,并保存Token
    }
    break;

运行这个示例程序,Can successfully resolve similar“age >= 45”这样的程序语句.程序的输出如下,其中第一列是Token的类型,第二列是Token的文本值:

Identifier   age
GE           >=  
IntLiteral   45  

上面的例子虽然简单,但其实已经讲清楚了词法原理,就是According to the built finite automaton,在不同的状态中迁移,从而解析出Token,只要再扩展这个有限自动机,增加里面的状态和迁移路线,就可以逐步实现一个完整的词法分析器了.

5. The above example involved4种Token,Expressed as a regular expression as follows:

Id :        [a-zA-Z_] ([a-zA-Z_] | [0-9])*
IntLiteral: [0-9]+
GT :        '>'
GE :        '>='

First to explain the several rules used in some of the symbols:

比如解析“int age = 40”这个语句,In this statement, for example to study the problems encountered in lexical analysis:The conflict between multiple rules.如果把这个语句涉及的词法规则用正则表达式写出来,是下面这个样子:

Int:        'int'
Id :        [a-zA-Z_] ([a-zA-Z_] | [0-9])*
Assignment : '='

这时候,may find such a problem:intThis keyword is very similar to an identifier,都是以字母开头,后面跟着其他字母,换句话说int这个字符串,既符合标识符的规则,又符合int这个关键字的规则,这两个规则发生了重叠.这样就起冲突了,当然int这个关键字的规则,比标识符的规则优先级高,General common identifiers with these keywords are not allowed to wish.

关键字vocabulary as a grammatical element in language design,例如表示数据类型的int、char,表示程序结构的while、if,表述特殊数据取值的null、NAN 等.In addition to keywords, there are some words called保留字,保留字在当前的语言设计中还没用到,But keep it because it will be used in the future.

When naming your variables、class name,不可以用到跟关键字和保留字相同的字符串.So in the lexer,如何把关键字和保留字跟标识符区分开呢?以“int age = 40”为例,The appearance of the automatic machine limited to the following,借此解决关键字和标识符的冲突:

The corresponding code also change,The first piece of code to change:

if (isAlpha(ch)) {
    if (ch == 'i') {
        newState = DfaState.Id_int1;  //对字符i特殊处理
    } else {
        newState = DfaState.Id;
    }
    ...  //后续代码
}

In the second piece of code, add the following statement:

case Id_int1:
    if (ch == 'n') {
        state = DfaState.Id_int2;
        tokenText.append(ch);
    }
    else if (isDigit(ch) || isAlpha(ch)){
        state = DfaState.Id;    //切换回Id状态
        tokenText.append(ch);
    }
    else {
        state = initToken(ch);
    }
    break;
case Id_int2:
    if (ch == 't') {
        state = DfaState.Id_int3;
        tokenText.append(ch);
    }
    else if (isDigit(ch) || isAlpha(ch)){
        state = DfaState.Id;    //切换回Id状态
        tokenText.append(ch);
    }
    else {
        state = initToken(ch);
    }
    break;
case Id_int3:
    if (isBlank(ch)) {
        token.type = TokenType.Int;
        state = initToken(ch);
    }
    else{
        state = DfaState.Id;    //切换回Id状态
        tokenText.append(ch);
    }
break;

接着,运行上面的示例代码,就会输出下面的信息:

Int               int
Identifier        age
Assignment        =  
IntLiteral        45  

And when trying to parse“intA = 10”程序时,会把intAParsed into an identifier.输出如下:

Identifier    intA
Assignment    =  
IntLiteral    10  

6. 解析完“int age = 40”之后,Then add some rules according to the above method,So it can handle arithmetic expression,例如“2+3*5”. Increase the lexical rules is as follows:

Plus :  '+'
Minus : '-'
Star :  '*' 
Slash : '/'

Then modify the finite automata and code,就能解析“2+3*5”了,会得到下面的输出:

IntLiteral  2
Plus        +  
IntLiteral  3  
Star        *  
IntLiteral  5  

可以看到,In order to achieve a lexical analyzer,First you need to write the regular expression for each lexical,And draw the finite automaton,As long as expressed in the code after the state transition process is ok.

三、语法分析

7. 语法分析的结果是生成AST.Algorithms are divided into top-down and bottom-up algorithms,The recursive descent approach is a common top-down algorithm.例如“int age = 45”这个语句,The following is a diagram of the syntax analysis algorithm:

First put the rules of the variable declaration statement,用形式化的方法表达一下.它的左边是一个非终结符(Non-terminal).右边是它的产生式(Production Rule).在语法解析过程中,左边会被右边替代,如果替代之后还有非终结符,那么继续这个替代过程,直到最后全部都是终结符(Terminal)也就是Token.only terminator才可以成为AST的叶子节点,这个过程也叫做推导(Derivation)过程:

intDeclaration : Int Identifier ('=' additiveExpression)?;

可以看到,intA declaration of a type variable needs to have aInt型的Token,加一个变量标识符,后面跟一个可选的赋值表达式.Translate the above grammar into program statements,伪代码如下:

//伪代码
MatchIntDeclare(){
  MatchToken(Int);        //匹配Int关键字
  MatchIdentifier();       //匹配标识符
  MatchToken(equal);       //匹配等号
  MatchExpression();       //匹配表达式
}

The actual code is inSimpleCalculator.java类的IntDeclare()方法中,如下所示:

SimpleASTNode node = null;
Token token = tokens.peek();    //预读
if (token != null && token.getType() == TokenType.Int) {   //匹配Int
    token = tokens.read();      //消耗掉int
    if (tokens.peek().getType() == TokenType.Identifier) { //匹配标识符
        token = tokens.read();  //消耗掉标识符
        //创建当前节点,And the variable names are written toASTNodes in the text value,
        //It is also possible to create a new variable subnode here.
        node = new SimpleASTNode(ASTNodeType.IntDeclaration, token.getText());
        token = tokens.peek();  //预读
        if (token != null && token.getType() == TokenType.Assignment) {
            tokens.read();      //消耗掉等号
            SimpleASTNode child = additive(tokens);  //匹配一个表达式
            if (child == null) {
                throw new Exception("invalide variable initialization, expecting an expression");
            }
            else{
                node.addChild(child);
            }
        }
    } else {
        throw new Exception("variable name expected");
    }
}

上面代码的意思是,When parsing a variable declaration statement, look at the firstToken是不是int,If it is that create aAST节点,记下int后面的变量名称,然后再看后面是不是跟了初始化部分,也就是等号加一个表达式,Have to check the equal sign,So then to match an expression.

Usually every part of production make a child node,比如变量声明语句会建立四个子节点,分别是int关键字、标识符、等号和表达式.后面的工具就是这样严格生成AST的.But the above example is simplified here,只生成了一个子节点,就是表达式子节点,变量名称记到ASTNode的文本值里去了,其他两个子节点没有提供额外的信息,就直接丢弃了.

从上面的代码中看到,程序是从一个Token的流中顺序读取.代码中的peek()方法是预读,只是读取下一个Token但并不把它从Token流中移除.在代码中用peek()方法可以预先看一下下一个Token是否是等号,从而知道后面跟着的是不是一个表达式,而read()方法会从Token流中移除,下一个Token变成了当前的Token.

The analytical variable declarations written statements and expressions of algorithm function.在语法分析的时候,调用这些函数跟后面的Token串做模式匹配,Match the returns aAST节点,否则就返回null.如果中间发现跟语法规则不符,就报编译错误.在这个过程中,上级文法嵌套下级文法,上级的算法调用下级的算法,表现在生成AST中,上级算法生成上级节点,下级算法生成下级节点,这就是“下降”的含义.

分析上面的伪代码和程序语句,可以看到程序结构基本上是跟文法规则同构的,This is a fall in the recursive algorithm(Recursive Descent Parsing)的优点,非常直观.Then continue to run the sample program,输出AST:

Programm Calculator
    IntDeclaration age
        AssignmentExp =
            IntLiteral 45

8. 解析完变量声明语句,理解了“下降”的含义之后,来看看如何用上下文无关文法(Context-free Grammar,CFGDescribe an arithmetic expression.More complex situations are encountered when parsing arithmetic expressions,This regular grammar is not enough,We must use a context-free grammar to express.Arithmetic expressions to include method of addition and subtraction, and two kinds of arithmetic,加法和乘法运算有不同的优先级,The rules are divided into two levels:第一级是加法规则,第二级是乘法规则.把乘法规则作为加法规则的子规则,这样在解析形成AST时,乘法节点就一定是加法节点的子节点,从而被优先计算.如下所示:

additiveExpression
    :   multiplicativeExpression
    |   additiveExpression Plus multiplicativeExpression
    ;

multiplicativeExpression
    :   IntLiteral
    |   multiplicativeExpression Star IntLiteral
    ;

Can be realized through the grammar of nested support for operational priorities.这样在解析“2 + 3 * 5”这个算术表达式时会形成类似下面的AST:

如果要计算表达式的值,只需要对根节点求值就可以了.为了完成对根节点的求值,需要对下级节点递归求值,所以先完成“3 * 5 = 15”,然后再计算“2 + 15 = 17”,When the analytic arithmetic expressions so,便能拿加法规则去匹配.Multiplication rules are nested in addition rules,通过文法的嵌套实现了计算的优先级.

应该注意的是,加法规则中还递归地又引用了加法规则,By this recursive definition it is possible to expand、形成所有各种可能的算术表达式.比如“2+3*5”的推导过程如下:

-->additiveExpression + multiplicativeExpression
-->multiplicativeExpression + multiplicativeExpression
-->IntLiteral + multiplicativeExpression
-->IntLiteral + multiplicativeExpression * IntLiteral 
-->IntLiteral + IntLiteral * IntLiteral

这种文法已经没有办法改写成正则文法了,它比正则文法的表达能力更强,叫做“上下文无关文法”.正则文法是上下文无关文法的一个子集,The difference is the context-free grammar allows recursive call,而正则文法不允许.上下文无关的意思是,In any case grammar derivation rules are the same,Such as the variable declaration statement,May use an arithmetic expression for variable initialization,而在其他地方可能也会用到算术表达式,No matter where arithmetic expression grammar are all the same,都允许用加法和乘法,计算优先级也不变.

Contextual situation need to handle the situation and some,但那不是语法分析阶段负责的,而是放在语义分析stage to deal with.

9. The above algorithm is only used“下降”,没有涉及“递归”,Now take a look at how to use the recursive algorithm of recursive grammar translation.先按照前面说的把文法直观地翻译成算法,But there is an infinite number of calls,来看个例子:

additiveExpression
    :   IntLiteral
    |   additiveExpression Plus IntLiteral
    ;

在解析“2 + 3”such a simplest addition expression,Intuitively it translated into algorithm results in the following situation:首先匹配是不是整型字面量,发现不是;然后匹配是不是加法表达式,这里是递归调用,will repeat the above two steps endlessly.“additiveExpression Plus multiplicativeExpression”The first part of the rules of the grammar is recursive references,这种情况叫做左递归.通过上面的分析,可以知道左递归是递归下降算法无法处理的,这是递归下降算法最大的问题.怎么解决呢?把“additiveExpression”Switch to the plus sign after the interview a try:

additiveExpression
    :   multiplicativeExpression
    |   multiplicativeExpression Plus additiveExpression
    ;

接着改写成算法,这个算法确实不会出现无限调用的问题:

private SimpleASTNode additive(TokenReader tokens) throws Exception {
    SimpleASTNode child1 = multiplicative();  //计算第一个子节点
    SimpleASTNode node = child1;  //If there is no second child node,就返回这个
    Token token = tokens.peek();
    if (child1 != null && token != null) {
        if (token.getType() == TokenType.Plus) {
            token = tokens.read();
            SimpleASTNode child2 = additive(); //Recursively parsing the second node
            if (child2 != null) {
                node = new SimpleASTNode(ASTNodeType.AdditiveExp, token.getText());
                node.addChild(child1);
                node.addChild(child2);
            } else {
                throw new Exception("invalid additive expression, expecting the right part.");
            }
        }
    }
    return node;
}

The above algorithm is mean,先尝试能否匹配乘法表达式,If not then the node is surely not addition,因为加法表达式的两个产生式都必须首先匹配乘法表达式,In this case to returnnull就可以了,The match without success.如果乘法表达式匹配成功,那就再尝试匹配加号右边的部分,也就是去递归地匹配加法表达式.如果匹配成功,就构造一个加法的ASTNode返回.

同样的,乘法的文法规则也可以做类似的改写:

multiplicativeExpression
    :   IntLiteral
    |   IntLiteral Star multiplicativeExpression
;

现在貌似解决了左递归问题,运行这个算法解析“2+3*5”,得到下面的AST:

Programm Calculator
    AdditiveExp +
        IntLiteral 2
        MulticativeExp *
            IntLiteral 3
            IntLiteral 5

可如果让这个程序解析“2+3+4”呢?如下所示:

Programm Calculator
    AdditiveExp +
        IntLiteral 2
        AdditiveExp +
            IntLiteral 3
            IntLiteral 4

It can be seen that the calculation order is wrong.连续相加的表达式要从左向右计算,这是加法运算的结合性规则.But according to the generatedAST变成从右向左了,先计算了“3+4”然后才跟“2”相加.为什么产生上面的问题呢?Because modified the grammar,把文法中加号左右两边的部分调换了一下.造成的影响是什么呢?可以推导一下“2+3+4”的解析过程:

(1)首先调用乘法表达式匹配函数multiplicative()成功,返回了一个字面量节点2.

(2)接着看看右边是否能递归地匹配加法表达式.

(3)匹配的结果,真的返回了一个加法表达式“3+4”,这个变成了第二个子节点.That's the error,这样的匹配顺序“3+4”一定会成为子节点,在求值时被优先计算.

所以,The previous methods there is no perfect way to solve the left recursion,因为它改变了加法运算的结合性规则.It will be mentioned later that the method both solves the left recursion problem,without generating errors in the order of calculations.

10. In fact, to implement an expression calculation,只需要基于AST做求值运算.这个计算过程比较简单,Only need to do to this tree深度优先遍历就好了.Depth-first traversal is also a recursive algorithm.以上文中“2 + 3 * 5”的AST为例看一下:

(1)对表达式的求值,等价于对AST根节点求值.

(2)首先求左边子节点,算出是2.

(3)接着对右边子节点求值,这时候需要递归计算下一层.After return is calculated by15(3*5).

(4)把左右节点相加,计算出根节点的值17.

还是以“2+3*5”为例,它的求值过程输出如下,You can see that the entire tree is traversed during the evaluation process:

    Calculating: AdditiveExp          //计算根节点
        Calculating: IntLiteral      //计算第一个子节点
        Result: 2                     //结果是2
        Calculating: MulticativeExp   //递归计算第二个子节点
            Calculating: IntLiteral
            Result: 3
            Calculating: IntLiteral
            Result: 5
        Result: 15                //忽略递归的细节,得到结果是15
    Result: 17                    //根节点的值是17

11. in the grammar rules for binary expressions,if the first element of the production is itself,Then the program will recurse infinitely,这种情况就叫做左递归.Such as production of additive expression“加法表达式 + 乘法表达式”,is left recursive.grammar rules are上下文无关文法表示的,Context-free grammars consist of a set of replacement rules(production)组成的,Such as arithmetic expression grammar rules can be expressed as the form below:

add -> mul | add + mul
mul -> pri | mul * pri
pri -> Id | Num | (add)

按照上面的产生式,add可以替换成mul或者add + mul,这样的替换过程又叫做“推导”.以“2+3*5”和“2+3+4”这两个算术表达式为例,Their derivation process, respectively, as shown in the figure below:

Can clearly see how these two expressions are generated.And the tree formed during the analysis,其实就是AST,It's just that the handwritten algorithm is generatingAST时,Often do some simplified,Omit some unnecessary nodes in the middle,比如“add-add-mul-pri-Num”这一条分支,Actual handwriting can be simplified into“add-Num”.其实简化ASTIt is also a means of optimizing the compilation process,if not simplified,The effect is as shown in the picture above.

So what are the leaf nodes of the two trees in the above figure??Num、+和*都是终结符,Terminal symbols are generated in lexical analysisToken,And those non-leaf nodes are non-terminals.The derivation process of the grammar,Is the process of the non-terminal constantly replaced,make the final result without nonterminals,only terminator.在实际应用中,Grammar rules are often written in the form:

add ::= mul | add + mul
mul ::= pri | mul * pri
pri ::= Id | Num | (add)

这种写法叫做“巴科斯范式(BNF)”,Antlr和YaccThis notation on both tools,sometimes later“::=”简写成“:”.还有一个术语叫做扩展巴科斯范式 (EBNF).它跟普通BNF表达式最大的区别,Is similar to regular expressions are used inside some of the writing.For example, the following rule applies*号,This section can be repeated0到多次

add -> mul (+ mul)*

Actually this kind of writing with standardBNF写法是等价的,Because a repeat item,is equivalent to derivation by recursion.Another inference can be drawn from here:Context-free grammars include regular grammars,Can do more than regular grammar.

12. To see how the priority of rules to ensure that expression and language.The above is derived from the addition rule to the multiplication rule,这种方式保证了AST中的乘法节点一定会在加法节点的下层,也就保证了乘法计算优先于加法计算.So you should operate the relationship(>、=、<)In addition the top,逻辑运算(and、or)On the relationship between operation of the upper.的确是如此,To write it out as shown below:

exp -> or | or = exp   
or -> and | or || and
and -> equal | and && equal
equal -> rel | equal == rel | equal != rel
rel -> add | rel > add | rel < add | rel >= add | rel <= add
add -> mul | add + mul | add - mul 
mul -> pri | mul * pri | mul / pri

The priority expressed here from low to high is:赋值运算、逻辑运算(or)、逻辑运算(and)、相等比较(equal)、大小比较(rel)、加法运算(add)、乘法运算(mul)和基础表达式(pri).而且Priority is to be able to change,Such as usually calculated by parentheses to change in the grammar priority.But how to express this idiom rules?In fact, at the lowest level,That is, the base expression with the highest precedence(pri)这里,The brackets expression wrapped,Recursively referencing expressions does the trick.这样Whenever parentheses are encountered while parsing an expression,Then you know this is the top priority,This changes the priority:

pri -> Id | Literal | (exp)

In the use of a language,If you don't know the exact priority of various operations,In addition to access to regular data,There is a new skillRead the grammar rules file for the language,These rules may be useBNF或EBNFThe writing of writing.

13. Let's discuss the issue of associativity.written above“2+3+4”The parsing grammar for this arithmetic expression is wrong,先计算了“3+4”然后才和“2”相加,Calculate order from right to left,The correct one should be from left to right.什么是结合性呢?The same priority operator calculation from left to right or from right to left calculation is called associativity.Common arithmetic operations such as addition, subtraction, multiplication and division are left-associative,“.”Symbol is also left the combination of.Have the right combination of example?The assignment operator is a typical example of right associative,比如“x = y = 10”.

To review the above“2+3+4”The cause of the error calculation order.Wrong right before use recursive grammar parsing the expression form of a simplified version ofAST如下:

根据这个AST做计算会出现计算顺序的错误.However, if the recursive term is written on the left,就不会出现这种结合性的错误.So it is concluded that a law:对于左结合的运算符,递归项要放在左边;而右结合的运算符,递归项放在右边.所以能看到,When writing the rules for addition expressions, it is written like this:

add -> mul | add + mul

14. then most binary operations are left associative,Do you have to face the problem of left recursion??The left-recursive grammar can be rewritten by,解决这个问题.前面提到Recursive descent algorithm cannot handle left recursion,But not all of the algorithms can't handle the left recursion,For some other algorithms left recursion is fine,比如LR算法.要消除左递归,In addition expression rules, for example,The original grammar was“add -> add + mul | mul”,Now rewrite it as:

add -> mul add'
add' -> + mul add' | ε

文法中,ε(读作 epsilon)means empty set.Next, use the rules just rewritten to deduce it again“2+3+4”这个表达式,Got the result on the left in the figure below:

The parse tree on the left is the result of the derivation.问题是由于 add’Recursive rule is right,If you use the standard recursive descent algorithm,Will be the same as before operator combined with sexual mistakes again.expectedASTThe tree is pictured above right,The combination is the right of it.那有没有解决办法呢?Carefully analyze the derivation process of the above grammar rules.Only the first step is according to theadd规则推导,After all is according to theadd’Derivation rules until the end.如果用EBNF方式表达,The above two rules can be combined into one:

add -> mul (+ mul)*

As such is written algorithm to optimize the benefits of the thinking of.对于(+ mul)*这部分,In fact, can be written as a cycle,Rather than a recursive call again and again.伪代码如下:

mul();
while(next token is +){
  mul()
  createAddNode
}

In the research of recursive functions,有一个概念叫做尾递归,The last sentence of a tail recursive function is to recursively call itself.The compiler typically tail recursion is converted into a loop statement,The principle used is the same as the pseudo code above.Relative to the recursive call,Loop statements have lower overhead on system resources,So the tail recursion into loop statement is also a kind of compiler optimization technology.Now already know how to write this left recursion algorithm,大概是下面的样子:

private SimpleASTNode additive(TokenReader tokens) throws Exception {
    SimpleASTNode child1 = multiplicative(tokens);  //应用add规则
    SimpleASTNode node = child1;
    if (child1 != null) {
        while (true) {                              //循环应用add'
            Token token = tokens.peek();
            if (token != null && (token.getType() == TokenType.Plus || token.getType() == TokenType.Minus)) {
                token = tokens.read();              //read plus sign
                SimpleASTNode child2 = multiplicative(tokens);  //Calculate subordinate nodes
                node = new SimpleASTNode(ASTNodeType.Additive, token.getText());
                node.addChild(child1);              //注意,The new node is at the top level,To ensure that the right combination
                node.addChild(child2);
                child1 = node;
            } else {
                break;
            }
        }
    }
    return node;
}

修改完后,Run the parser analysis again“2+3+4+5”,会得到正确的AST:

Programm Calculator
    AdditiveExp +
        AdditiveExp +
            AdditiveExp +
                IntLiteral 2
                IntLiteral 3
            IntLiteral 4
        IntLiteral 5

这样,The left recursion problem solved.

15. 首先,一门SQLThis kind of script language is to support the statement,比如变量声明语句、Assignment statements, and so on.A single expression,Can also be regarded as statements,叫做“表达式语句”.在终端里输入2+3;,will be echoed back5来,This is how the expression is executed as a statement,According to the grammar is just behind the expression in a semicolon more.C语言和JavaA semicolon is used to mark the end of a statement,这里也可以这样写,With extended backus paradigm(EBNF)Write the following syntax rules:

programm: statement+;  

statement
: intDeclaration
| expressionStatement
| assignmentStatement
;

变量声明语句以int开头,followed by the identifier,Then there is an optional initialization part,That is, an equals sign and an expression,最后add a semicolon,如下所示:

intDeclaration : 'int' Id ( '=' additiveExpression)? ';';

表达式语句The current example only supports addition expressions,Other expressions such as conditional expressions can be added in the future,It also put the semicolon:

expressionStatement : additiveExpression ';';

赋值语句is an identifier followed by an equals sign and an expression,add a semicolon:

assignmentStatement : Identifier '=' additiveExpression ';';

To use variables in expressions,还需要把primaryExpression改写,In addition to contain integer literal,Also include identifiers and expressions enclosed in parentheses:

primaryExpression : Identifier| IntLiteral | '(' additiveExpression ')';

这样,Sent the grammatical features of want to achieve,All terms and regulations are expressed.

16. Implementation of above formula calculator supports only the numeric literals arithmetic,If can use variables will be more useful in the expression,Such as to perform the following two sentences:

int age = 45;
age + 10 * 2;

The two statements in grammatical feature contains a variable declaration、给变量赋值,and referencing variables in expressions.In order to give the variable assignment,Must be in a scripting language interpreter in opening a store,Records of different variables and their values,如下所示:

private HashMap<String, Integer> variables = new HashMap<String, Integer>();

This simply USES aHashMapAs a variable storage area.in variable declaration statements and assignment statements,can modify the data in this variable storage area,And obtain a variable's value can use the following code:

if (variables.containsKey(varName)) {
    Integer value = variables.get(varName);  //获取变量值
    if (value != null) {
        result = value;                      //设置返回值
    } else {                                 //有这个变量,没有值
        throw new Exception("variable " + varName + " has not been set any value");
    }
}
else{ //没有这个变量.
    throw new Exception("unknown variable: " + varName);
}

Through such a simple storage mechanism,Can support variable.Of course the storage mechanism is too simple,Mentioned scope behind such a simple storage mechanism is not enough.But for now, use it like this and then improve it.The next step is to parse the assignment statement,例如“age = age + 10 * 2;”,如下所示:

private SimpleASTNode assignmentStatement(TokenReader tokens) throws Exception {
    SimpleASTNode node = null;
    Token token = tokens.peek();    //预读,See if the following identifier
    if (token != null && token.getType() == TokenType.Identifier) {
        token = tokens.read();      //read identifier
        node = new SimpleASTNode(ASTNodeType.AssignmentStmt, token.getText());
        token = tokens.peek();      //预读,Look at the following is the equal sign
        if (token != null && token.getType() == TokenType.Assignment) {
            tokens.read();          //取出等号
            SimpleASTNode child = additive(tokens);
            if (child == null) {    //出错,There is no valid expression to the right of the equal sign
                throw new Exception("invalide assignment statement, expecting an expression");
            }
            else{
                node.addChild(child);   //添加子节点
                token = tokens.peek();  //预读,See if the semicolon
                if (token != null && token.getType() == TokenType.SemiColon) {
                    tokens.read();      //Consume the semicolon

                } else {            //报错,缺少分号
                    throw new Exception("invalid statement, expecting semicolon");
                }
            }
        }
        else {
            tokens.unread();    //回溯,Spit out previously digested identifiers
            node = null;
        }
    }
    return node;
}

上面代码的逻辑是:

(1)Since you want to match an assignment statement,So should first take a look at firstTokenIs it an identifier,If not then returnnull,匹配失败.

(2)如果第一个Tokenis indeed an identifier,just consume it,Then see if it's followed by an equals sign.

(3)if not equals,It proved that this is not an assignment statement,maybe an expression or something,Then you need to roll back what was just consumed.Token,就像什么都没有发生过一样,并且返回null.The method called is backunread().

(4)If it is followed by an equals sign,Then continue to see if there is an expression behind,If the expression followed by a semicolon.如果不是,就报错.

This completes the parsing of the assignment statement.

17. In the process of the design rules of grammar,There are a trap,The trap to better understand a characteristics of recursive descent approach:回溯.Understanding this feature can be more clear understanding recursive reduction algorithm implementation process,To go to find a way to optimize it.考虑一下age = 45;这个语句,When using algorithms to do pattern matching,Some special happens.看一下对statement语句的定义:

statement
: intDeclaration
| expressionStatement
| assignmentStatement
;

首先会尝试intDeclaration,但是age = 45;statement does not start withint开头的,So this attempt will returnnull.Then try toexpressionStatement,Take a look at the following algorithm:

private SimpleASTNode expressionStatement() throws Exception {
        int pos = tokens.getPosition();  //Note down the initial position
        SimpleASTNode node = additive(); //match the addition rule
        if (node != null) {
            Token token = tokens.peek();
            if (token != null && token.getType() == TokenType.SemiColon) {   //Requirements must end with a semicolon
                tokens.read();
            } else {
                node = null;
                tokens.setPosition(pos); // 回溯
            }
        }
        return node;
    }

出现了什么情况呢?age = 45;Statement the most on the left is an identifier.根据语法规则,Identifier is a legitimateaddtiveExpresion,因此additive()Function returns a non-null value.Next, a semicolon should be scanned later.,But the identifier is followed by an equals sign,This demonstrates that the pattern matching failure.Failed algorithm mustTokenThe stream pointer is dialed back to its original position,就像一切都没发生过一样.

因为不知道addtive()How many steps down did the function try,because it can be a very complex expression,Consume a lot of aToken,所以The location where the algorithm starts must be noted,and fall back to this position on failure.After trying a rule unsuccessfully,Revert to the original and try another rule,这个现象就叫做“回溯”.

Because there are may need to go back,So the recursive descent algorithm sometimes does something useless.在assignmentStatement的算法中,就通过unread()backtracked aToken,而在expressionStatement中,Because not sure you want to go back a few steps,So we have to write down the initial position in advance.匹配expressionStatement失败后,Algorithm to matchassignmentStatement,The successful.

Process of testing and back,is a typical feature of recursive descent algorithm.Recursive descent approach is simple,But it is through the test and back,But always can match the correct grammar,这就是它的强大之处.当然,The disadvantage is that backtracking will reduce the efficiency a little,But it can be improved and optimized on this basis,Implement recursive descent with predictive analytics,And the recursive prediction analysis.With a clear understanding of the recursive descent algorithm,When going to learn other parsing algorithms,Also will understand faster.

Then let's talk about another problem involved in backtracking.:When to go back,When to prompt for syntax errors?While reading the sample code above,You should find some error handling code in there,并抛出了异常.As in assignment statement,If there is no matching success equals sign behind a addition expression,Just think the grammar is wrong.Because in the syntax of the above example,Behind the equals sign only with expression,There is no other possibility,如下所示:

token = tokens.read();          //read equal sign
node = additive();    //Match a addition expression
if (node == null) {
    //There must be another expression to the right of the equal sign  
    throw new Exception("invalide assignment expression, expecting an additive expression");
}

When the match is unsuccessful in the algorithm,That should be back,should try other possibilities,Why it's an error here?In fact, the final result of the two methods is the same.Prompt syntax error,It means knowing that there are no other possible matching options.,Don't need to waste time to go back.Just as in the above grammar,Behind the equals sign necessarily with expression,Otherwise it must be a grammar mistakes,Even if no syntax error is reported here,Such as test after all other options,Still need to quote grammar mistakes.所以说,The grammatical errors in advance,In fact is a kind of optimization algorithm.

18. Scripting languages are generally provides a command line window to input lines statement,Explain and execute it immediately and get the output,比如Node.js、Python、SQL等都提供了这样的界面.这个输入、执行、打印的循环过程就叫做REPL(Read-Eval-Print Loop).在SimpleScript.java 中,Also implements a simpleREPL,It's basically reading code line by line from the terminal,When faced with a semicolon interpretive execution,代码如下:

SimpleParser parser = new SimpleParser();
SimpleScript script = new SimpleScript();
BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));   //从终端获取输入

String scriptText = "";
System.out.print("\n>");   //提示符

while (true) {             //无限循环
    try {
        String line = reader.readLine().trim(); //读入一行
        if (line.equals("exit();")) {   //Hard-coded exit condition
            System.out.println("good bye!");
            break;
        }
        scriptText += line + "\n";
        if (line.endsWith(";")) { //If there is no meet the semicolon,will read one more line
            ASTNode tree = parser.parse(scriptText); //语法解析
            if (verbose) {
                parser.dumpAST(tree, "");
            }
          
            script.evaluate(tree, ""); //对AST求值,并打印

            System.out.print("\n>");   //显示一个提示符

            scriptText = "";
        }

    } catch (Exception e) { //如果发现语法错误,报错,然后可以继续执行
        System.out.println(e.getLocalizedMessage());
        System.out.print("\n>");   //提示符
        scriptText = "";
    } 
}

运行java SimpleScript,You can try various statements in the terminal.If it is the correct statement,System will immediately back to the result.If the statement is wrong,REPLAlso can feedback error message back,And be able to continue to deal with the following statement.

四、The compiler front-end tool:Antlr

19. The above examples are simplified for ease of understanding,在实际应用中,A complete compiler will achieve a lot of other work in terms of lexical and grammar,如下图所示:

Antlr是一个开源工具,Support according to the rules file generated lexical analyzer and syntax analyzer,它自身是用Java实现的,Can support a wider range of target languages ​​includingJava、C#、JavaScript、Python、Go、C++、Swift.Antlr通过解析规则文件来生成编译器,规则文件以.g4结尾,词法规则和语法规则可以放在同一个文件里,不过为了清晰起见,Here they are divided into two files,First written in a file词法规则.创建一个Hello.g4File is used to save the lexical rules,Then write some lexical rules used before:

lexer grammar Hello;  //lexerkeyword means this is a lexical rule file,名称是Hello,要与文件名相同

//关键字
If :               'if';
Int :              'int';

//字面量
IntLiteral:        [0-9]+;
StringLiteral:      '"' .*? '"' ;  //字符串字面量

//操作符
AssignmentOP:       '=' ;    
RelationalOP:       '>'|'>='|'<' |'<=' ;    
Star:               '*';
Plus:               '+';
Sharp:              '#';
SemiColon:          ';';
Dot:                '.';
Comm:               ',';
LeftBracket :       '[';
RightBracket:       ']';
LeftBrace:          '{';
RightBrace:         '}';
LeftParen:          '(';
RightParen:         ')';

//标识符
Id :                [a-zA-Z_] ([a-zA-Z_] | [0-9])*;

//空白字符,抛弃
Whitespace:         [ \t]+ -> skip;
Newline:            ( '\r' '\n'?|'\n')-> skip;

Can be very intuitive to see,Every lexical rule starts with a capital letter,这是AntlrTo the provisions of the lexical rules,The grammar rules start with a lowercase letter,where each rule is written with a regular expression.The compiler lexical rules,在终端中输入命令:

antlr Hello.g4

这个命令是Antlr编译规则文件,并生成Hello.javafile and two other auxiliary files.Then use the following commands to compileHello.java:

javac *.java

结果会生成Hello.class文件,This is the lexical analyzer generated.Then write a script file,Let the generated lexer parse it:

int age = 45;
if (age >= 17+8+20){
  printf("Hello old man!");
}

The above script save ashello.play文件,Then enter the following command in the terminal:

grun Hello tokens -tokens hello.play

grunCommand is actually invoked the generated lexical analyzer that is just nowHello类,打印出对hello.play词法分析的结果,如下所示:

从结果中看到,Lexical analyzer to eachToken都识别了,Also recorded their position within the code、文本值、类别.上面这些都是Token的属性.以第二行[@1, 4:6=‘age’,< Id >,1:4]为例,其中@1是Tokenserial number of,表明这是1号Token;4:6是Tokenstart and end position in the character stream;age是文本值,Id是其Token类别;最后的 1:4 表示这个 Token In the source code in the first 1 行、第 4 列.

20. has now letAntlrran smoothly,Next, make the lexical rules more perfect、more rigorous,Can refer to mature rule files.从Antlrselected from some of the sample rules filesJavaas a reference.Let's take a look at the rules for string literals written earlier:

StringLiteral:      '"' .*? '"' ;  //字符串字面量

This version is quite simplified,That is, the double quotes can contain any character.In practice, however, can this not good,Because even the escape function is not provided.For some invisible characters such as carriage return,To provide escape function such as“\n”.同时,If the string itself have double quotes also will it escape,如“\”.Unicode也要转义,The last escape character itself also needs to be escaped,如“\\”.This is a contentJavaComplete rules for string literals in the language,This rule is very careful,Taking all kinds of escaping into account:

STRING_LITERAL:     '"' (~["\\\r\n] | EscapeSequence)* '"';

fragment EscapeSequence
    : '\\' [btnfr"'\\]
    | '\\' ([0-3]? [0-7])? [0-7]
    | '\\' 'u'+ HexDigit HexDigit HexDigit HexDigit
    ;

fragment HexDigit
    : [0-9a-fA-F]
    ;  

In the rules file,fragmentRefers to a syntax fragment,is to make the rules definition clearer,It does not generateToken,只有StringLiteralrules will be generatedToken.

in the rules file from the previous exercise,把>=、>、<Are classified as relational operators,As the sameToken,而+、*All alone as another kind ofToken.So what can merge into a kind of,Which ones need to be listed separately??其实It mainly depends on the needs of the grammar,That is, in the grammar rules file,can appear in the same rule.They have no difference at the grammatical level,There is only a difference at the semantic level.比如,Addition and subtraction are different operations though,But they can also appear in the same grammatical rule,The characteristics of them in operation,包括优先级和结合性,Multiplication and division can appear in multiplication rules at the same time.

Combine plus and minus signs into one category,It is possible to combine multiplication and division into one class.把这4operators each as a separate class,也是可以的.但是,Cannot treat plus sign and multiply sign as the same class,because they have different precedence in arithmetic operations,Must appear in different grammar rules.I encountered a problem when doing lexical analysis earlier,Analysis of the lexical conflict problem,That is, the rules for identifiers and keywords overlap..

Antlr是怎么解决这个问题的呢?很简单,It introduced the concept of priority.在Antlrin the rules file,The more the rule declared earlier, the higher the priority.所以,Put the key rulesIDin front of the rules,When the algorithm is executed, it will first check whether it is a keyword,Then we will check forID,也就是标识符.

21. 接下来,试着AntlrGenerate a parser,Replacing the initial handwritten parser.This time the file name is calledPlayScript.g4,playscriptFor the scripting language on the name of the,Here's file at the beginning:

grammar PlayScript;
import CommonLexer;   //import lexical definitions

/*The following content is added to the generatedJavasource file header,如包名称,import语句等.*/
@header {
package antlrtest;
}

Then put the syntax definition done above into it.AntlrThere is an internal mechanism to automatically handle left recursion problems,You can safely write the grammar rules as follows:

expression
    :   assignmentExpression
    |   expression ',' assignmentExpression
    ;

assignmentExpression
    :   additiveExpression
    |   Identifier assignmentOperator additiveExpression
    ;

assignmentOperator
    :   '='
    |   '*='
    |  '/='
    |   '%='
    |   '+='
    |   '-='
    ;

additiveExpression
    :   multiplicativeExpression
    |   additiveExpression '+' multiplicativeExpression
    |   additiveExpression '-' multiplicativeExpression
    ;

multiplicativeExpression
    :   primaryExpression
    |   multiplicativeExpression '*' primaryExpression
    |   multiplicativeExpression '/' primaryExpression
    |   multiplicativeExpression '%' primaryExpression
    ;

继续运行下面的命令,生成语法分析器

antlr PlayScript.g4
javac antlrtest/*.java

然后测试一下生成的语法分析器:

grun antlrtest.PlayScript expression -gui

这个命令的意思是:测试PlayScript这个类的expression方法,也就是解析表达式的方法,结果用图形化界面显示.Enter the following in the console interface:

age + 10 * 2  + 10
^D

其中^D是Ctrl+D,Equivalent to the terminal type aEOF字符,At the end of the file sign(Windows要使用^Z).Of course, you can also put these statements in the file in advance,The file name as a command parameters.The parser will then analyze these grammars,And pop up a window to displayAST:

有了Antlr的支持,Can put main energy on writing on the lexical and grammatical rules,提升了工作效率,比如SparkSQL的SQLAnalysis is also usedAntlr4.

22. AntlrCan automatic processing the left recursion question,So when writing expression,Can be boldly written as left recursive to save time.But it still need to write a rule for each operation,After the logical operation is finished, the addition operation needs to be written.,The addition operation is finished, the multiplication operation is written,This enables support for priority,还是有些麻烦.其实,Can all the operations with a grammatical rules to cover,Then use the most concise way to support the expression of precedence and associativity.在如下的PlayScript.g4Grammar rules file,Only a small piece of code will be finished all the expression rules describe:

expression
    : primary
    | expression bop='.'
      ( IDENTIFIER
      | functionCall
      | THIS
      )
    | expression '[' expression ']'
    | functionCall
    | expression postfix=('++' | '--')
    | prefix=('+'|'-'|'++'|'--') expression
    | prefix=('~'|'!') expression
    | expression bop=('*'|'/'|'%') expression  
    | expression bop=('+'|'-') expression 
    | expression ('<' '<' | '>' '>' '>' | '>' '>') expression
    | expression bop=('<=' | '>=' | '>' | '<') expression
    | expression bop=INSTANCEOF typeType
    | expression bop=('==' | '!=') expression
    | expression bop='&' expression
    | expression bop='^' expression
    | expression bop='|' expression
    | expression bop='&&' expression
    | expression bop='||' expression
    | expression bop='?' expression ':' expression
    | <assoc=right> expression
      bop=('=' | '+=' | '-=' | '*=' | '/=' | '&=' | '|=' | '^=' | '>>=' | '>>>=' | '<<=' | '%=')
      expression
;

This file includes almost all expression rules need,including dot notation expressions that are barely mentioned、Increasing and decreasing the expression、数组表达式、An arithmetic expression rules, etc,已经很完善了.It is how to support the priority?原来,Priority is determined through on the right side of the order of the different production.in a standard context-free grammar,The order of production is irrelevant,But in the concrete algorithm,productions are tried in a determined order.

same grammar,According to the different order to derive this,得到的AST可能是不同的.This is unacceptable from the point of view of grammar theory,But it is acceptable from a practical point of view.比如LL文法和LR文法的概念,Refers to the grammar inLL算法或LRAlgorithm is under normal work.For example, the grammar that did the addition operation earlier,is the one with the recursive term on the right,Can cause associative errors in recursive descent algorithms,但是如果用LR算法,就完全没有这个问题,生成的AST完全正确,如下所示:

additiveExpression
    :   IntLiteral
    |   IntLiteral Plus additiveExpression
    ;

再来看看AntlrHow to achieve associativity according to this grammar rule.在语法文件中,AntlrFor the assignment expression<assoc=right>attribute annotation,Specifies that assignment expressions are right-associative.If not marked, it is left associative.Through the above that the simplified algorithm,ASTsuccessfully simplified,There will be no addition node、Various nodes such as multiply nodes,But the unified expression forexpression节点.then if both are the same expression node,How to distinguish them in the parser??How do you know which node addition or multiplication is do?

很简单,Search the current node can have an operatorToken.For example, if there is an or operationToken(“||”),Or is to be logical,而且语法里面的bop=、postfix=、prefix=这些属性,As some of the operatorsToken的别名,Will also be a expression node attributes.By querying the values of these properties,Can quickly determine the current operation type.

23. 到目前为止,Completed the syntax of expressions,It is safe to use in the scripting language various expression,Focus on perfecting all kinds of sentences(Statement)The syntax of the work.例如PlayScript.g4The rules of file statement:

statement
    : blockLabel=block
    | IF parExpression statement (ELSE statement)?
    | FOR '(' forControl ')' statement
    | WHILE parExpression statement
    | DO statement WHILE parExpression ';'
    | SWITCH parExpression '{' switchBlockStatementGroup* switchLabel* '}'
    | RETURN expression? ';'
    | BREAK IDENTIFIER? ';'
    | SEMI
    | statementExpression=expression ';'
    ;

same as expression,一个statementRules can cover all kinds of common sentences,包括if语句、for循环语句、while循环语句、switch语句、return语句等等.Expression is a semicolon after it is also a kind of statement,叫做表达式语句.From the perspective of the difficulty of syntax analysis,The statements above grammar than expression grammar simple,左递归、The issues of priority and associativity do not arise here.在C和Java等语言中,ifStatement of the rules of grammar is such a:

statement : 
          ...
          | IF parExpression statement (ELSE statement)? 
          ...
          ;
parExpression : '(' expression ')';

这里用了IF和ELSE这两个关键字,Also reuse has defined statement and expression rules.Statements and expression rules once the design is completed,can be reused by other grammar rules.但是ifSentences are also unsettling,比如会涉及到二义性文法问题,So then borrowif语句,Analyze the phenomenon of ambiguous grammar,例如嵌套ifStatements and Suspensionelse的情况,比如下面这段代码:

if (a > b)
if (c > d)
doSomething();
else
doOtherthings();

The code above intentionally cancelled code indentation,So can you seeelsewith whichifwhat about pairing?Once grammar rules are not written well enough,Is likely to form ambiguity,Also is to use the same grammatical rules can be deduced two different sentences,Or to generate two differentAST,This grammar is called ambiguous grammar,比如下面这种写法:

stmt -> if expr stmt
      | if expr stmt else stmt
      | other

按照这个语法规则,To adopt the first production or is derived using the second production first,会得到不同的AST.the one on the leftAST中else跟第二个if配对;the one on the rightAST中,else跟第一个if配对,如下图所示:

Most senior language while parsing this sample code will produce on the left side of theAST,即elsethe nearestif配对.Is there a way to write no grammatical ambiguity?如下所示:

stmt -> fullyMatchedStmt | partlyMatchedStmt
fullyMatchedStmt -> if expr fullyMatchedStmt else fullyMatchedStmt
                   | other
partlyMatchedStmt -> if expr stmt
                   | if expr fullyMatchedStmt else partlyMatchedStmt

按照上面的语法规则,Only the only way of,Also can only generate a uniqueAST,如下所示:

其中,解析第一个ifStatement can only be applied whenpartlyMatchedStmt规则,Parse the second if else语句时,只能适用fullyMatchedStmt规则.因此,Rewrite the rules of grammar to solve the problem of ambiguous grammar.Say it back to the frontAntlr定义的语法,The grammar doesn't seem to be complicated,How can we ensure that there is no ambiguity problem??因为AntlrParsing grammar is used whenLL算法.LLThe algorithm is a depth-first algorithm,So in the resolution to the firststatement时,Will build the next levelif节点,The next level of node in theelseclause parsed out.如果Antlr不用LL算法,就会产生二义性.因此Grammars often work with parsing algorithms.

Then forforStatements an example,如下所示:

for (int i = 0; i < 10; i++){
  println(i);
}

Related rules of grammar is as follows:

statement : 
         ...
          | FOR '(' forControl ')' statement
         ...
          ;

forControl 
          : forInit? ';' expression? ';' forUpdate=expressionList?
          ;

forInit 
          : variableDeclarators 
          | expressionList 
          ;

expressionList
          : expression (',' expression)*
          ;

As you can see from the grammar rules above,for Statements in the final analysis is a statement、Expression and variable declaration form.代码中的 for 语句,After parsingAST如下:

熟悉了forAfter the statement syntax,提一下语句块(block),在if语句和forStatement would use it,Block syntax constitute the following:

block
    : '{' blockStatements '}'
    ;

blockStatements
    : blockStatement*
    ;

blockStatement
    : variableDeclarators ';'     //变量声明
    | statement
    | functionDeclaration         //函数声明
    | classDeclaration            //类声明
    ;

24. Now has a pretty good grammar system,In addition to on the back of function、Outside the syntax of class,Have almost finishedplayscriptAll the grammar of the design work.Next, upgrade the script interpreter,Let it to be able to support more grammar,同时通过使用VisitorPatterns improve code structure.

In the above hand-written scripting language interpreter,用了一个evaluate()The method traverses the entire tree top-down.As more and more grammars to deal with,The amount of code in this method will increase,不便于维护.而The visitor pattern for eachAST节点,There will be a single method to handle,To make the code more clear,也更便于维护.AntlrCan help generate aVisitor模式的框架,可以在命令行输入:

antlr -visitor PlayScript.g4

-visitor参数告诉AntlrTo generate the following two interfaces and classes:

public interface PlayScriptVisitor<T> extends ParseTreeVisitor<T> {...}

public class PlayScriptBaseVisitor<T> extends AbstractParseTreeVisitor<T> implements PlayScriptVisitor<T> {...}

在PlayScriptBaseVisitor中,可以看到很多visitXXX()这样的方法,每一种ASTNode corresponds to a way,例如:

@Override public T visitPrimitiveType(PlayScriptParser.PrimitiveTypeContext ctx) {...}

其中泛型< T >Refers to the type of data returned when visiting each node.ASTNodes may return various types of data,所以这里可以让Visitor统一返回Object类型,Can be applied to all kinds of things.这样Visitor就是下面的样子(Generics adoptObject):

public class MyVisitor extends PlayScriptBaseVisitor<Object>{
  ...
}

这样在visitExpression()Methods can write code that evaluates various expressions,For example, the code for addition and subtraction is as follows:

public Object visitExpression(ExpressionContext ctx) {
        Object rtn = null;
        //二元表达式
        if (ctx.bop != null && ctx.expression().size() >= 2) {
            Object left = visitExpression(ctx.expression(0));
            Object right = visitExpression(ctx.expression(1));
            ...
            Type type = cr.node2Type.get(ctx);//Data types are the result of semantic analysis

            switch (ctx.bop.getType()) {
            case PlayScriptParser.ADD:        //加法运算
                rtn = add(leftObject, rightObject, type);
                break;
            case PlayScriptParser.SUB:        //减法运算
                rtn = minus(leftObject, rightObject, type);
                break;
            ...   
            }
        }
        ...
}

其中ExpressionContext就是ASTThe expression node,叫做ContextMean can take out all the context information in the node,包括父节点、子节点等.Each child node with the name of the syntax of the name is the same,For example, the grammar rules for addition and subtraction are as follows:

expression bop=('+'|'-') expression

那么可以用ExpressionContextThe following methods to access the children:

ctx.expression();     //返回一个列表,里面有两个成员,are the child nodes on the left and right sides, respectively
ctx.expression(0);    //运算符左边的表达式,是另一个ExpressionContext对象
ctx.expression(1);    //运算符右边的表达式
ctx.bop();            //一个Token对象,其类型是PlayScriptParser.ADD或SUB
ctx.ADD();            //访问ADD终结符,As additive operation,This method returns a null value
ctx.MINUS();         //访问MINUS终结符

When doing additive operation can recursively to the lower nodes evaluated,like in the codevisitExpression(ctx.expression(0)).同样If you want to run the entire script,只需要visitThe root node.So you can use this way,为每个ASTA node implementationvisit方法,Thus refactoring the previous handwritten interpreter.In addition to realize the expression evaluation,还可以为if、forStatement to write the evaluation logic.以forStatements, for example code is as follows:

// Initialization part performs a
if (forControl.forInit() != null) {
    rtn = visitForInit(forControl.forInit());
}

while (true) {
    Boolean condition = true; // If there is no conditional part,Means that have been circulating
    if (forControl.expression() != null) {
        condition = (Boolean) visitExpression(forControl.expression());
    }

    if (condition) {
        // 执行for的语句体
        rtn = visitStatement(ctx.statement(0));

        // 执行forUpdate,通常是“i++”这样的语句.The order of execution can't go wrong.
        if (forControl.forUpdate != null) {
            visitExpressionList(forControl.forUpdate);
        }
    } else {
        break;
    }
}

这里需要注意forThe execution of each part of the statement of the rules,比如:

(1)forInitSome can only perform a;

(2)每次循环都要执行一次forControl,See if continue to cycle;

(3)接着执行forStatement in the body;

(4)最后执行 forUpdate 部分,通常是“i++”这样的语句.

五、作用域和生存期:Implementation scope and function

25. 上面已经用Antlr重构了脚本解释器,With the help of tools, more advanced functions can be realized,比如函数功能、面向对象功能.There are some challenges to overcome along the way,比如:

(1)If you want to realize function function,To upgrade the variable management mechanism;

(2)引入作用域机制,来保证变量的引用指向正确的变量定义;

(3)提升变量存储机制,You can't simply throw a variable and its value into aHashMap里,To manage its lifetime to reduce memory usage.

作用域(Scope)Refers to the variables in a computer language、函数、类等起作用的范围,For example, is to use the following codeC语言写的,在全局以及函数fun中分别声明了a和b两个变量,然后在代码里对这些变量做了赋值操作:

/*
scope.c
测试作用域.
 */
#include <stdio.h>

int a = 1;

void fun()
{
    a = 2;
    //b = 3;   //出错,不知道b是谁
    int a = 3; //Is it allowed to declare a variable with the same name??
    int b = a; //这里的a是哪个?
    printf("in fun: a=%d b=%d \n", a, b);
}

int b = 4; //bThe scope of the start from here

int main(int argc, char **argv){
    printf("main--1: a=%d b=%d \n", a, b);

    fun();
    printf("main--2: a=%d b=%d \n", a, b);

    //用本地变量覆盖全局变量
    int a = 5;
    int b = 5;
    printf("main--3: a=%d b=%d \n", a, b);

    //test block scope
    if (a > 0){
        int b = 3; //允许在块里覆盖外面的变量
        printf("main--4: a=%d b=%d \n", a, b);
    }
    else{
        int b = 4; //跟if块里的b是两个不同的变量
        printf("main--5: a=%d b=%d \n", a, b);
    }

    printf("main--6: a=%d b=%d \n", a, b);
}

This code is compiled to run,结果是:

main--1: a=1 b=4 
in fun: a=3 b=3 
main--2: a=2 b=4 
main--3: a=5 b=5 
main--4: a=5 b=3 
main--6: a=5 b=5

Therefore, it can be concluded that the rule:

(1)变量的作用域有大有小,外部变量在函数内可以访问,The local variables in the function can only be accessed locally.

(2)The scope of the variable starts after the declaration.

(3)在函数里,You can declare with external variables of the same name,这个时候就覆盖了外部变量.

Intuitively the image below shows the sample code in the scope of each variable:

CIn the language and块作用域的概念,statement enclosed in curly braces,if和else后面就跟着这样的语句块.块作用域的特征跟函数作用域的特征相似,都可以访问外部变量,也可以用本地变量覆盖掉外部变量.But the language in this respect the design of the mechanism is different,比如下面这段用JavaWrite the code to use aif语句块,并且在if、else部分和外部分别声明了一个变量 c:

/**
 * Scope.java
 * 测试Java的作用域
 */
public class ScopeTest{

    public static void main(String args[]){
        int a = 1;
        int b = 2;

        if (a > 0){
            //int b = 3; //不允许声明与外部变量同名的变量
            int c = 3;
        }
        else{
            int c = 4;   //允许声明另一个c,各有各的作用域
        }
        
        int c = 5;  //这里也可以声明一个新的c
    }
}

可以看到Java的块作用域跟C语言的块作用域是不同的,它不允许块作用域里的变量覆盖外部变量.那么JavaScript呢?Take a look at this testJavaScriptThe scope of the code:

/**
 * Scope.js
 * 测试JavaScript的作用域
 */
var a = 5;
var b = 5;
console.log("1: a=%d b=%d", a, b);

if (a > 0) {
    a = 4;
    console.log("2: a=%d b=%d", a, b);
    var b = 3; //看似声明了一个新变量,其实还是引用的外部变量
    console.log("3: a=%d b=%d", a, b);
}
else {
    var b = 4;
    console.log("4: a=%d b=%d", a, b);
}

console.log("5: a=%d b=%d", a, b);

for (var b = 0; b< 2; b++){  //这里是否能声明一个新变量,用于for循环?
    console.log("6-%d: a=%d b=%d",b, a, b);
}

console.log("7: a=%d b=%d", a, b);

This code is compiled to run,结果是:

1: a=5 b=5
2: a=4 b=5
3: a=4 b=3
5: a=4 b=3
6-0: a=4 b=0
6-1: a=4 b=1
7: a=4 b=2

可以看到,JavaScript是没有块作用域的.And in the blockforStatement attempts to redefine variableb,语法上是允许的,But the same variable is used each time.从上面的例子来看,The internal mechanism of the syntax that looks similar is different,这种不同其实是语义差别的一个例子.Mentioned at the beginning of a lot of content has now belongs to the category of the semantic,对作用域的分析就是语义分析的任务之一.

26. After understanding what scope is,understand the closely related生存期(Extent).It is the time period during which the variable can be accessed,From allocating memory for it to recover its memory between the time.In the previous examples program,The lifetime of the variable is the same as the scope,Out of scope lifetime ends,The memory occupied by the variable is also freed.This is standard feature of the local variable,These local variables are managed by the stack.但也有一些情况,The lifetime of the variable is inconsistent with the syntactic scope,For example in the heap to apply for the memory of,It will still exist after exiting the scope.

例如下面这段C语言的示例代码:

/*
extent.c
test lifetime.
 */
#include <stdio.h>
#include <stdlib.h>

int * fun(){
    int * b = (int*)malloc(1*sizeof(int)); //在堆中申请内存
    *b = 2;  //assign the address2
   
    return b;
}

int main(int argc, char **argv){
    int * p = fun();
    *p = 3;

    printf("after called fun: b=%lu *b=%d \n", (unsigned long)p, *p);
 
    free(p);
}

其中funThe function returns a pointer to an integer.The function after the local variableb就消失了,The pointer to the memory(&b)took it back,其中*b是取b的地址,This address points to a small space on the stack,因为bIt was applied in the stack.An address is stored in this small space on the stack,Point to the application in the heap memory.This memory is also used to actually store the values.2的空间,has not been recovered,必须手动使用free()function to recover.

类似的情况在Java里也有.JavaInstances of objects are spawned on the heap by default.下面的示例代码中,A reference to an object was returned from a method,You can continue to modify the contents of the object based on this reference,This proves that the memory of this object has not been freed:

/**
 * Extent2.java
 * 测试Javalifetime characteristics of
 */
public class Extent2{
 
    StringBuffer myMethod(){
        StringBuffer b = new StringBuffer(); //Generated in the heap object instance
        b.append("Hello ");
        System.out.println(System.identityHashCode(b)); //打印内存地址
        return b;  //返回对象引用,本质是一个内存地址
    }

    public static void main(String args[]){
        Extent2 extent2 = new Extent2();
        StringBuffer c = extent2.myMethod(); //获得对象引用
        System.out.println(c);
        c.append("World!");         //修改内存中的内容
        System.out.println(c);

        //跟在myMethod()The value of the printed in the same
        System.out.println(System.identityHashCode(c));
    }
}

因为JavaThe memory used by objects beyond the application memory is the scope of the,So there is no way to automatically recover.因此JavaAutomatic memory management mechanism is used,也就是GC技术.Scope and the concept of survival is more basic computer language,Because they correspond to the runtime memory management basic mechanism.Although the design features of each language are different,But the mechanics at runtime are very similar,For example, stack and heap are used for memory management..

27. To understand the scope and the principle of survival,Let's design the scope mechanism first,Then simulates a stack.在之前的PlayScriptscript implementation,When handling variable assignment variables are simply a hash table,Use the variable name to reference,就像下面这样:

public class SimpleScript {
    private HashMap<String, Integer> variables = new HashMap<String, Integer>();
    ...
}

But if the variable exists in multiple scopes,这样做就不行了.When you design a data structure to distinguish the various variable scope.Analysis of the previous code,You can see that the scope is a tree structure,比如Scope.cThe scope is as follows:

Object-oriented language is not the same,it's not a tree it's a wood,Each class corresponds to a tree,So it has no global variables.在playscript语言中,The following object structure is designed to representScope:

//Occurring in the process of compiling variable、函数、类、块,are called symbols
public abstract class Symbol {
    //符号的名称
    protected String name = null;

    //owning scope
    protected Scope enclosingScope = null;

    //可见性,比如public还是private
    protected int visibility = 0;

    //Symbol关联的AST节点
    protected ParserRuleContext ctx = null;
}

//作用域
public abstract class Scope extends Symbol{
    // 该Scope中的成员,包括变量、方法、类等.
    protected List<Symbol> symbols = new LinkedList<Symbol>();
}

//块作用域
public class BlockScope extends Scope{
    ...
}

//函数作用域
public class Function extends Scope implements FunctionType{
    ...  
}

//类作用域
public class Class extends Scope implements Type{
    ...
}

It divides the three scopes,Respectively is a piece of scope(Block)、函数作用域(Function)And the class scope(Class).在解释执行playscript的ASTWhen you need to build a tree structure of the active domain,The scope of the analysis process is part of the semantic analysis.That is not aAST,You can run it immediately,Semantic analysis is also required before running,Such as the scope for analysis,Can do correct reference for each variable,so that the program can be executed correctly.

Solve the problem of the scope of the later,Let's take a look at how to solve the problem of lifetime.还是看Scope.c的代码,随着代码的执行,The lifetime performance of each variable is as follows:

(1)进入程序,Effective global variables one by one;

(2)进入main函数,mainThe variables in the function sequence effect;

(3)进入fun函数,funThe variables in the function sequence effect;

(4)退出fun函数,funThe variables in the function of failure;

(5)进入if语句块,ifThe variables in the block sequence effect;

(6)退出if语句块,ifIn the block variables failure;

(7)退出main函数,mainThe variables in the function of failure;

(8)退出程序,全局变量失效.

通过下面这张图,Can intuitively see running stack in the process of change:

Code execution process of entry and exit each scope,可以用栈来实现.Each into a scope,push a data structure onto the stack,这个数据结构叫做栈桢(Stack Frame).The stack frame is able to save the current scope of all the value of the local variable,When out of scope,The stack frame is pop up,The inside of the variable is failure.Can see stack mechanism can effectively use of memory,When variables are beyond the scope, useless,Can be discarded from memory.在ASTEvaluator.javaCan be used in the following data structure to represent the stack and the stack frame,其中的PlayObject通过一个HashMapTo preserve the value of each variable:

private Stack<StackFrame> stack = new Stack<StackFrame>();

public class StackFrame {
    //该frame所对应的scope
    Scope scope = null;

    //enclosingScope所对应的frame
    StackFrame parentFrame = null;

    //The actual variable location
    PlayObject object = null;
}

public class PlayObject {
    //成员变量
    protected Map<Variable, Object> fields = new HashMap<Variable, Object>();
}

Currently here is only a conceptual imitation of the stack frame,当用Java语言实现时,PlayObject对象是存放在堆里的,JavaAll objects are stored in the heap,Only basic data types such asintand object references are placed on the stack.Although just imitate,does not prevent the concept of building stack frames,In the back-end technology part,It can realize the real stack frame.

要注意的是,The structure and the stackScopeThe tree structure is not consistent.That is to say, the upper level stack frame in the stack,不一定是Scope的父节点.To access at the next higher levelScope中的变量数据,to follow the stackparentFrame去找.The above figure shows this situation,在调用funThere are three stack frames in the function stack:global stack frame、main()function stack frame andfun()function stack frame,其中main()function stack frameparentFrame和fun()function stack frameparentFrameglobal stack frame.

The scope and stack are already done,After this, many functions can be realized,比如让if语句和forLooping statements using block scope and local variables.以for语句为例,visitThe method first generates a stack frame for it,并加入到栈中,运行完毕之后,pop from the stack:

BlockScope scope = (BlockScope) cr.node2Scope.get(ctx);  //获得Scope
StackFrame frame = new StackFrame(scope);  //创建一个栈桢
pushStack(frame);    //加入栈中

...

//运行完毕,弹出栈
stack.pop();

When you need to get the value of a variable in your code,First of all, in the current frame to find,If you can't find it, go to the frame corresponding to the upper-level scope to find it.,如下所示:

StackFrame f = stack.peek();       //Get the frame at the top of the stack
PlayObject valueContainer = null;
while (f != null) {
    //To see if the variable belongs to the current stack frame
    if (f.scope.containsSymbol(variable)){ 
        valueContainer = f.object;
        break;
    }
    //从上一级scopeCorresponding to the stack frame to find  
    f = f.parentFrame;
}

28. 进一步的,Can realize the function of support.First look at the related with the function of grammar,如下所示:

//函数声明
functionDeclaration
    : typeTypeOrVoid? IDENTIFIER formalParameters ('[' ']')*
      functionBody
    ;
//函数体
functionBody
    : block
    | ';'
    ;
//类型或void
typeTypeOrVoid
    : typeType
    | VOID
    ;
//All parameters of the function
formalParameters
    : '(' formalParameterList? ')'
    ;
//参数列表
formalParameterList
    : formalParameter (',' formalParameter)* (',' lastFormalParameter)?
    | lastFormalParameter
    ;
//单个参数
formalParameter
    : variableModifier* typeType variableDeclaratorId
    ;
//Variable parameter number case,最后一个参数
lastFormalParameter
    : variableModifier* typeType '...' variableDeclaratorId
    ;
//函数调用    
functionCall
    : IDENTIFIER '(' expressionList? ')'
    | THIS '(' expressionList? ')'
    | SUPER '(' expressionList? ')'
    ;

在函数里,Consider an additional factor:参数.在函数内部,Parameters with the average of the local variables when use the same,At runtime they also act like local variables,save in stack frame.An object can be designed to represent the definition of a function,It includes the parameter list and the type of the return value:

public class Function extends Scope implements FunctionType{
    // 参数
    protected List<Variable> parameters = new LinkedList<Variable>();

    //返回值
    protected Type returnType = null;
    
    ...
}

在调用函数时,A three-step actually do:

(1)build a stack frame;

(2)Calculating the value of all parameters,and put it on the stack frame;

(3)execute function body in function declaration.

相关代码如下所示:

//函数声明的AST节点
FunctionDeclarationContext functionCode = (FunctionDeclarationContext) function.ctx;

//Create a stack frame
functionObject = new FunctionObject(function);
StackFrame functionFrame = new StackFrame(functionObject);

// Calculate the value of the actual parameter
List<Object> paramValues = new LinkedList<Object>();
if (ctx.expressionList() != null) {
    for (ExpressionContext exp : ctx.expressionList().expression()) {
        Object value = visitExpression(exp);
        if (value instanceof LValue) {
            value = ((LValue) value).getValue();
        }
        paramValues.add(value);
    }
}

//According to the parameter name,Variable is added in the stack frame
if (functionCode.formalParameters().formalParameterList() != null) {
    for (int i = 0; i < functionCode.formalParameters().formalParameterList().formalParameter().size(); i++) {
        FormalParameterContext param = functionCode.formalParameters().formalParameterList().formalParameter(i);
        LValue lValue = (LValue) visitVariableDeclaratorId(param.variableDeclaratorId());
        lValue.setValue(paramValues.get(i));
    }
}

// call method body
rtn = visitFunctionDeclaration(functionCode);

// 运行完毕,弹出栈
stack.pop();

Analysis of scope is a job of semantic analysis.Antlr Can do a lot of lexical analysis and syntax analysis,But the semantic analysis work needs to be done by users themselves.

六、面向对象:Implement the data and methods of packaging

29. An important feature of object-oriented is encapsulation.也就是说,Objects can encapsulate data and to the operation of the data together,构成一个不可分割的整体,尽可能地隐藏内部的细节,Only keep some interfaces to communicate with the outside world.Only the outside of the object by these interfaces and objects interact,No need to know the internal details of the object.This can reduce the system coupling,Implement internal mechanisms to hide,Don't worry about the impact on the outside.Then from the perspective of semantic,利用类型、作用域、The Concept of Lifetime Analysis of Object-Oriented Encapsulation Features.

类型处理is an important work in semantic analysis.Early computer language support only some basic data types,Such as all kinds of type integer and floating point,Types like strings that are inseparable from programming,Tend to be on the basic data types encapsulation and abstract away.From scope for,As a type of it is usually in the range is visible to the entire application,Can use it to declare variables.一些像Javalanguages ​​can also limit the scope of use of certain types,For example, only in a certain namespace,Or within a class.

What is the scope of object members??对象的属性(Refers to a class member variable)can be accessed inside the entire object,No matter in which position statement.也就是说The scope of object properties is internal to the entire object,方法也是一样.This is not the same as function and a piece of the local variable,They have a requirement on declaration order,像C和JavaSuch languages ​​must declare a variable before using it.

From the lifetime for,Object member variables of survival generally is the same with the object's lifetime,When creating objects to all member variables as initial,In all member variables when the destruction of objects are destroyed along with.Of course, if a member has references to memory from the heap application,These memory need to manually release,or freed by the garbage collection mechanism.But there are also some members that are not bound to the object,But with the type of binding,比如Java中的静态成员.The difference between static members and ordinary members,It's just that the scope and lifetime are different,Its scope is all object instances of the type,被所有实例共享.Survival was there before any object instance creation,On the final object is destroyed before will not go away.

The next step is to support the definition of classes in the language,在PlayScript.g4中,The grammar rules for classes can be defined like this:

classDeclaration
    : CLASS IDENTIFIER
      (EXTENDS typeType)?
      (IMPLEMENTS typeList)?
      classBody
    ;

classBody
    : '{' classBodyDeclaration* '}'
    ;

classBodyDeclaration
    : ';'
    | memberDeclaration
    ;

memberDeclaration
    : functionDeclaration
    | fieldDeclaration
    ;

functionDeclaration
    : typeTypeOrVoid IDENTIFIER formalParameters ('[' ']')*
      (THROWS qualifiedNameList)?
      functionBody
    ;

Let's briefly talk about this grammar rule:

(1)类声明以class关键字开头,There is an identifier type name,Followed by the class body.

(2)The members of the class are declared in the body of the class.在简化情况下,You can only focus on the properties and methods of the class.Here, the method of the class is also called deliberatelyfunction而不是method,I want to make some unified design of object methods and functions.

(3)Function declarations now play the role of class methods.

(4)Member variables of the class of the same statement and the variable declaration on grammar.

With the above written grammarplayscript脚本的效果如下:

/*
ClassTest.play A simple object-oriented features.
*/
class Mammal{
  //类属性
  string name = "";

  //构造方法
  Mammal(string str){
    name = str;
  }

  //方法
  void speak(){
    println("mammal " + name +" speaking...");
  }
}

Mammal mammal = Mammal("dog"); //playscriptThe structure of the special way,不需要new关键字
mammal.speak();                          //访问对象方法
println("mammal.name = " + mammal.name); //访问对象的属性

//没有构造方法,Create with default constructor
class Bird{
  int speed = 50;    //initialized in the default constructor

  void fly(){
    println("bird flying...");
  }
}

Bird bird = Bird();              //Using the default constructor
println("bird.speed : " + bird.speed + "km/h");
bird.fly();

接下来让playscriptThe interpreter to handle these code,怎么处理呢?After lexical analysis and syntax analysis,playscript会In semantic analysis phase scanningAST,Identify all types of custom,In order to refer to these types in other places to declare variables.Because the type of statement can be anywhere in the code,So it is better to separate a traversal identification and record type.Then when declaring variables,You can refer to the type.Semantic analysis of another job,Is to be the resolution of variable type.当声明“Bird bird = Bird();”时,需要知道BirdWhere is the definition of object,to access its members correctly.

When doing semantic analysis,To store the type definition in a data structure,来实现一下:

public class Class extends Scope implements Type{
    ...
}

public abstract class Scope extends Symbol{
    // 该Scope中的成员,包括变量、方法、类等.
    protected List<Symbol> symbols = new LinkedList<Symbol>(
}

public interface Type {
    public String getName();    //类型名称

    public Scope getEnclosingScope();
}

在这个设计中,可以看到Class就是一个Scope,ScopeIn it, various members can be stored,Now you can direct reuse,Used to store the class attributes and methods,Draw a class diagram as follows:

There are several classes,比如Symbol、Variable、Scope、Function和BlockScope,They are the symbol system of the main members.When do the lexical analysis parsing out a lot of identifier,These identifiers appear in different grammar rules,包括变量声明、表达式,And as the name of the class、method name etc..在语义分析阶段,Identify these identifiers one by one,For example, is a local variable or a method name, etc.

变量、The names of the classes and functions are called符号,Such as in the example programMammal、Bird、mammal、bird、name、speed等.Compilation process is an important work is to establish the symbol table,It helps to further compile or execute the program,The symbol table uses the above classes to store information,save its name in the symbol table、类型、作用域等信息.For classes and functions also have corresponding place to save class variables、方法、参数、返回值等信息.

30. After parsing the semantic information,To see how the runtime execution has the characteristics of the object-oriented program,Such as how to instantiate an object?How to manage object data in memory?and how to access the properties and methods of the object?First create the object through the constructor.In the following syntax is uselessnewThis keyword represents the creation of an object,But omitted direct call a class with the same name as the function,This is a unique design here,示例代码如下:

Mammal mammal = Mammal("dog"); //playscriptThe structure of the special way,不需要new关键字
Bird bird = Bird();            //Using the default constructor

But in semantic checks,In the current scope is certainly can't find such a function,因为The initialization method of a class is defined inside the class,只要检查一下Mammal和BirdIs it a class name?.再进一步,MammalClasses do have a method to constructMammal(),而BirdThe class does not actually have an explicitly defined constructor,这里借鉴了Java的初始化机制,Is to provide the default initialization method,In the default initialization method,Members will perform object statement of initialization work.So the above code in the callBird(),In fact is called the default initialization method.with or without explicitly declared constructors,Statement object initialization part will be when a member variable.对于Bird类,Actually executes the above“int speed = 50;”这个语句.

When doing semantic analysis,The following code can detect a function call is actually the constructor of a class,Or is the default constructor:

// See if the construction of a class function,With the same name as the find aclass
Class theClass = at.lookupClass(scope, idName);
if (theClass != null) {
    function = theClass.findConstructor(paramTypes);
    if (function != null) {
        at.symbolOfNode.put(ctx, function);
    }
    //If it is a method with the same name as the class,并且没有参数,Then the default constructor is
    else if (ctx.expressionList() == null){
        at.symbolOfNode.put(ctx, theClass); // TODO 直接赋予class
    }
    else{
        at.log("unknown class constructor: " + ctx.getText(), ctx);
    }

    at.typeOfNode.put(ctx, theClass); // This function call returns an object
}

当然,Class constructors are different from ordinary functions,例如Constructors are not allowed to define return values,Because it is the return value of the object must be an instance of this class.

31. After the object is default-initialized,Then call the explicitly defined constructor,In this way, the entire object instantiation process can be perfected.不过,You can save the data of ordinary local variables on the stack,What about how to preserve the data of object?其实,You can also treat object data like other data,保存在栈里,如下图所示:

C语言的结构体struct和C++语言的对象,Can be stored in the stack in the.Save the objects in the stack is directly declare and instantiate,而不是用new关键字来创建的.如果用new关键字来创建,In fact, a block of memory is allocated in the heap,And assigned to a pointer variable,如下图所示:

When the object is stored in the heap,There can be multiple variables all referencing the same object,Such as the variables in the grapha和变量bYou can refer to the same objectobject1.Member variables of a class can also refer to other objects,比如object1Class members of the referenceobject2对象.The object's lifetime can go beyond that created it stack frame of survival.

You can compare the advantages and disadvantages of these two approaches.If the object stored in the stack in the,Then it's survival and the scope is the same,Can be automatically created and destroyed,Therefore no additional memory management is required,The disadvantage is that objects can't exist and sharing for a long time.Objects created in the heap can be shared,But it increases the burden of memory management.所以在C和C++语言中,Be careful to manage memory allocated from the heap,Free this memory when appropriate.在Java和其他一些语言中,The garbage collection mechanism is,That is, when an object is no longer referenced,Collected from the memory.

Actually here can helpJavaTo optimize the memory management.Such as on the analysis of the code,If you find an object to create and use are limited in a certain scope block,is not shared with other scopes,Then the object's lifetime is in line with the current stack frame,Can apply for in the stack memory,rather than in the heap,It can avoid the late work of garbage collection.

After analyzing the memory management of the object,回到之前playscript的实现.在playscript的Java版本里,可以用一个ClassObjectTo save the object data,而ClassObject是PlayObject的子类.前面已经讲过PlayObject,It is used by stack frames to hold local variables,可以通过传入VariableTo access the object attribute values:

//类的实例
public class ClassObject extends PlayObject{
     //类型
    protected Class type = null;
    ... 
}

//保存对象数据
public class PlayObject {
    //成员变量
    protected Map<Variable, Object> fields = new HashMap<Variable, Object>();

    public Object getValue(Variable variable){
        Object rtn = fields.get(variable);
        return rtn;
    }

    public void setValue(Variable variable, Object value){
        fields.put(variable, value);
    }
}

When you need to access an object at runtime,也会用ClassObjectto make a stack frame,So that it can be like a visit to a local variable access attributes of the object.Without the need for access to the object is removed from the stack,if no other object references this object,那么它会被JavaThe garbage collection mechanism of recycling.

32. Finally, access the properties and methods of the object.Generally with the dot operator to access the object properties and methods,如下所示:

mammal.speak();                          //访问对象方法
println("mammal.name = " + mammal.name); //访问对象的属性
Reference is also a kind of expression of the attributes and methods,语法定义如下:
expression
    : ...
    | expression bop='.'
      ( IDENTIFIER       //对象属性
      | functionCall     //对象方法
      )
     ...
     ;

Here the operation of the dot notation can be directed toScalacascading like that,比如:

obj1.obj2.field1;
obj1.getObject2().field1;

另外,Object members can also set visibility.That is to say, some members can only be used inside the object.,By some external access.这个怎么实现呢?It's only a matter of semantics,是When doing semantic checks at compile time,Do not allow private members to be accessed externally,It is ok that the compiler error,In other respects it is no different.

七、闭包

33. 在JavaScript中,After returning an inner function with an outer function,The inner function can always access the local variables in the outer function.It stands to reason that the outer function has exited at this time,The variables in it should also be invalidated.可闭包very persistent,Even if has withdrawn from the outer function,But the inner function can also continue to visit the outer function declared in the variable.But the closure is very useful,For authors of library it can hide the internal implementation details.来测试一下JavaScript的闭包特性,如下所示:

/**
 * clojure.js
*/
var a = 0;

var fun1 = function(){
    var b = 0;                // 函数内的局部变量

    var inner = function(){   // 内部的一个函数
        a = a+1;
        b = b+1;
        return b;             // Returns the internal members
    }

    return inner;             // 返回一个函数
}

console.log("outside:  a=%d", a);

var fun2 = fun1();                            // 生成闭包
for (var i = 0; i< 2; i++){
    console.log("fun2: b=%d a=%d",fun2(), a); //通过fun2()来访问b
}

var fun3 = fun1();                            // Generate a second closure
for (var i = 0; i< 2; i++){
    console.log("fun3: b=%d a=%d",fun3(), a); // b等于1,重新开始
}

在Node.jsThe result of running the above code in the environment is as follows:

outside:  a=0
fun2: b=1 a=1
fun2: b=2 a=2
fun3: b=1 a=3
fun3: b=2 a=4

Two conclusions can be drawn from this result:

(1)Inner functions can access their own local variables、外层函数的变量b和全局变量a.

(2)The inner function as the return value assigned to other variables later,The outer function is over,But the inner function can still access the original outer function variableb,Can also access the global variablea.

Standing in the point of view of the outer function,When this function has withdrawn from,变量bShould fail,Why can also continue to visit?但是如果站在innerThe point of view of the function,声明innerTell it can access functionb,not becauseinnerOther functions assigned to the,innerThe original correct statement in the function cannot be used.,如下图所示:

其实As long as can function as a value passed around,There will be a scope mismatch,Such inherent contradictions are determined during language design.,因此Closure is a function in order to can continue to run in this situation provided a solution.This scheme has some good characteristics,For example, the data used by the hidden function.

这里补充一下静态作用域(Static Scope),如果一门语言的作用域是静态作用域,Then sign the reference relationship between clear can be determined according to the code at compile time is,在运行时不会变.某个函数是在哪声明的,就具有它所在位置的作用域.它能够访问哪些变量,That is with the variable binding,在运行时就一直能访问这些变量.看下面的代码,For static scope,无论在哪里调用foo()函数,访问的变量i都是全局变量:

int i = 1;
void foo(){
  println(i); // 访问全局变量
}

foo();        // 访问全局变量

void bar(){
  int i = 2; 
  foo();      // 在这里调用foo(),Access is still a global variable
}

Most languages ​​today use static scoping.用Antlr自定义的playscriptA language is also formed at compile time into aScope的树,Variable reference is also is done at compile time resolution no longer change,So static scope is also used.反过来讲,如果在bar()里调用foo()时,foo()访问的是bar()函数中的本地变量i,This language is used动态作用域(Dynamic Scope),That is variable reference with variable declaration is not die bound at compile time.在运行时,It is in the running environment dynamically for a variable of the same name.在macOS或Linux中用的bash脚本语言,Is the dynamic scope of.

Static scope can be determined by program code,在编译时就能完全确定,所以又叫做词法作用域(Lexcical Scope).But the lexical like do not said the lexical analysis of lexical,这里跟LexicalThe corresponding vocabulary can be thought of asRuntime,一个是编译时,一个是运行时.Describe closures using the concept of static scope,可以这样说:Because language is the static scope,The variables it can access need to be accessible at all times,To do this, it is necessary to extend the lifetime of some variables.

34. There is another condition for the generation of the closure,is to make the function a first-class citizen,例如ScalaThe functional language.在JavaScript和Python等语言里,Functions can be used like numbers,Such as to the variable assignment、作为参数传递给其他函数,作为函数返回值等等.This feature is very useful,Such as it can handle array collection.给数组的map方法传入一个回调函数,The result will generate a new array.The whole process is very simple,There are no repetitive loop statement,This is one of the reasons why many people advocate functional programming:

var newArray = ["1","2","3"].map(
      fucntion(value,index,array){
          return parseInt(value,10)
      })

那么在自定义的playscript中,How do you treat a function as a first-class citizen??Requires support function as base type,This makes it possible to declare variables with this type.How to declare a variable of function type??在JavaScriptThe language of the dynamic type,You can assign a function to any variable,like the previous codeinner函数作为返回值,被赋给了fun2和fun3两个变量.然而在GoIn languages ​​such as languages ​​that require strict type matching,就比较复杂了,如下所示:

type funcType func(int) int // Go语言,声明了一个函数类型funcType
var myFun funType          // A variable is declared with this function type

It has stricter requirements on the prototype of the function:函数必须有一个int型的参数,返回值也必须是int型的.CThe declaration of function pointers in the language is also stricter,在下面代码中myFunPointer to point to a function,This function also is to have aint类型的参数,返回值也是 int:

int (*myFun) (int);        //C语言,声明一个函数指针

playscriptThis stricter declaration can also be used,Because it wants to achieve a statically typed language:

function int (int) myFun;  //playscriptdeclare a function variable in

Write the above syntax for describing function types asAntlr的规则如下:

functionType
    : FUNCTION typeTypeOrVoid '(' typeList? ')'
    ;

typeList
    : typeType (',' typeType)*
    ;

在playscript中,用FuntionTypeInterface representing a function type,The return value type can be obtained through this interface、Parameter types both information:

package play;
import java.util.List;
/**
 * 函数类型
 */
public interface FunctionType extends Type {
    public Type getReturnType();        //返回值类型
    public List<Type> getParamTypes();  //参数类型
}

Try the actual use effect how to,用AntlrSyntax parsing the following:

function int(long, float) fun2 = fun1();

它的意思是:调用fun1()Function returns another function,这个函数有两个参数,返回值是int型的.用grun显示一下AST,Can see that it has taken thefunctionTypeCorrectly parse out:

Now just finished design grammar,还要实现runtimeRuntime functionality,Make functions really pass around like numbers,As the test code below,它把foo()作为值赋给了bar():

/*
FirstClassFunction.play 函数作为一等公民.
Is a function can value,赋给别的变量.
支持函数类型,即FunctionType.
*/
int foo(int a){
    println("in foo, a = " + a);
    return a;
}

int bar (function int(int) fun){
    int b = fun(6);
    println("in bar, b = " + b);
    return b;
}

function int(int) a = foo;  //Function as a variable initialization value
a(4);

function int(int) b;        
b = foo;                    //Function is used to assignment statement
b(5);

bar(foo);                   //函数做为参数

运行结果如下:

in foo, a = 4
in foo, a = 5
in foo, a = 6
in bar, b = 6

运行这段代码,You will find that it implements assignment with a function,而To implement the function of the key,Is to do a good job in the semantic analysis.For example, the compiler must be able to recognize the assignment statement in thefoo是一个函数,Rather than a traditional values.在调用a()和b()时,It also wants to call correctlyfoo()的代码,而不是报“找不到a()函数的定义”这样的错误.

35. After implementing the function of the first-class citizen function,Enter one of the most important ring:Implement the closure function.Design test cases first,The mentioned the firstJavaScript例子用playscriptThe syntax of the rewrite it again,To test the closure function:

/**
 * closure.play
 * Test closure features
 */
int a = 0;

function int() fun1(){        //函数的返回值是一个函数
    int b = 0;                //函数内的局部变量

    int inner(){              //内部的一个函数
        a = a+1;
        b = b+1;
        return b;             //Returns the internal members
    }

    return inner;             //返回一个函数
}

function int() fun2 = fun1();  
for (int i = 0; i< 3; i++){
    println("b = " + fun2() + ", a = "+a);
}

function int() fun3 = fun1();  
for (int i = 0; i< 3; i++){
    println("b = " + fun3() + ", a = "+a);
}

Code running effect andJavaScriptIs the same version of the program:

b = 1, a = 1
b = 2, a = 2
b = 3, a = 3
b = 1, a = 4
b = 2, a = 5
b = 3, a = 6

这段代码的AST如下图所示,Can visually see the outer function and the inner function relationship:

Now the test cases ready,To realize the closure mechanism.前面提到,Closure of the internal contradictions is a runtime environment and defines the scope of the conflicts between.那么The internal environment variables need to pack to closure function,it can access these variables at any time.在ASTDo the graphical analysis on,看看给fun2what happens when this variable is assigned:

Simply describe the tofun2Assignment of the execution process:

(1)先执行fun1()函数,内部的inner()The function is returned to the caller as a return value.Then the program can access the two layers of scope,The nearest layer isfun1(),里面有变量b;There is another layer,There is a global variablea.This is the last chance to package environment variables,否则退出fun1()函数以后,变量b就消失了.

(2)Then the internal function along with packaging good environment variable value,创建一个FunctionObject对象,作为fun1()的返回值,给到调用者.

(3)给fun2这个变量赋值.

(4)调用fun2()函数.函数执行时,have a private closure environment accessibleb的值,This environment is created in the second stepFunctionObject对象.

This finally realizes the function of the closure.在这个过程中,to record in advanceinner()Function which reference the external variables,For these variables packaging.This is when the semantic analysis of program,可以参考ClosureAnalyzer.java中的代码:

 /**
     * Evaluate closure variables for a function,That is, the external environment variable it refers to.
     * 算法:Do all its variable reference,Remove the variables declared within,The rest is outside of the.
     * @param function
     * @return
     */
private Set<Variable> calcClosureVariables(Function function){
    Set<Variable> refered = variablesReferedByScope(function);
    Set<Variable> declared = variablesDeclaredUnderScope(function);
    refered.removeAll(declared);
    return refered;
}

下面是ASTEvaluator.javaThe code snippet that wraps environment variables into closures in,It gets data from the current stack:

/**
 * Get the value of an environment variable for a closure
 * @param function The closure of the associated function.This function will access some environment variables.
 * @param valueContainer  A container for storing the values ​​of environment variables
 */
private void getClosureValues(Function function, PlayObject valueContainer){
    if (function.closureVariables != null) {
        for (Variable var : function.closureVariables) {
            // Now also can take from the stack,After exit function is alright
            LValue lValue = getLValue(var); 
            Object value = lValue.getValue();
            valueContainer.fields.put(var, value);
        }
    }
}

Now has achieved the closure mechanism,Function also turned into a first class citizens.Inadvertently also support to some extent函数式编程(functional programming).A typical feature of functional programming is that高阶函数(High-order function)功能,Higher-order functions is such a,It is able to accept any other function as a parameter,javascript中数组的map方法与Scala中的map和foreach就是高阶函数.其实,Closure is to function in a static scope variable access by survival spin,form a piece of data that can be accessed by this function alone.Because this data can only be accessed by the closure function,So it also has the ability to encapsulate the information、The characteristics of hidden internal details.

The concept of it and encapsulation is a bit like,Encapsulation is to encapsulate data and operations on data together,is one of the concepts of object-oriented programming.A closure can be thought of as an object.反过来看,An object can also be seen as a closure.Object's properties by method can also be regarded as the exclusive environment variable,Its survival also must ensure that can be the method has been normal access.

八、语义分析

36. You can get a syntax tree when doing syntax analysis,And what can be done based on this tree is a matter of semantics.比如,+The meaning of the sign is to add two numbers together,and usually also performs default type conversions,So if you want to distinguish the differences between different languages,Not only see the syntax of the language.比如Java语言和JavaScriptThe syntax of the code block is the same, all use curly braces,But on the semantics are different,one with block scope and one without.Compared with the lexical and grammatical design and processing,Semantic design and analysis are much more complex.所以先来看类型系统这个方面.

A type system is a collection of all types in a language,Operation of these types of rules,And how the interaction between the type(For example, can one type be converted to another type?).If you want to build a perfect type system,Need from two aspects::

(1)According to the requirements in the field of,Design system are characteristic of their.

(2)Support for type checking in the compiler、Type derivation and type conversion.

First, from the perspective of designing type system characteristics.At this level the machine code,In fact, no data type can be distinguished.,都是0101,Why do high-level languages ​​add a type of mechanism??Type is for a set of numerical,and a set of operations on top of this set of values.比如对于数字类型,Addition, subtraction, multiplication and division arithmetic operations can be performed on it,For not a string.所以,A semantic type is given by the high-level language,The type of this mechanism,Is equivalent to set the rules,It is possible to check whether the operation imposed on the data is legitimate.So the biggest benefit of the type system,Is can reduce the calculation error probability through type checking.

根据Whether type checking is done at compile time or runtime,Can be divided into two classes of the computer language:

(1)静态类型语言(Almost all of the type checking is performed at compile time).

(2)动态类型语言(Type checking is done at runtime).

Associated with static type and dynamic concept,And strong type and weak type.强类型语言中,The type of a variable cannot be changed once declared,弱类型语言中,Variable types can be changed at runtime.二者的本质区别是,Strongly typed languages ​​do not allow illegal operations,Because that can be detected,Weakly typed language from mechanism cannot prohibit illegal operation,所以是不安全的.比如一个表达式a*b,如果a或b不是数值,那就没有意义了,Weakly typed languages ​​may not detect such problems.因此,What types of static and dynamic type to check the problem of,Strong type and weak type said is even check check not to come out,Or can't check the problem,两者容易混淆.

这里的自定义playscriptIs statically typed and strongly typed,So almost all kinds of type checking are done.both support objects,Native primitive data types are also supported,The processing characteristics of these two types are different.

37. Next, let's talk about how to do type checking、Type derivation and type conversion.See it first if you write a compiler,A what are the problems in the analysis of the type.In this expression, for example,There will be different results in different situations:

a = b + 10

(1)如果b是一个浮点型,b+10Is the result of the floating point.如果b是字符串型的,Some languages are allowed to perform+number operation,The actual result is a concatenation of strings.The analysis process is类型推导(Type Inference).

(2)When the value on the right is calculated and assigned toa时,To check whether the two sides of the type matching.这个过程就是类型检查(Type Checking).

(3)如果a的类型是浮点型,And send it to the right is the integer,Then it is usually for the default类型转换(Type Conversion).

类型的推导、Check and transformation is three working,However, the technical means used are similar,所以放在一起讲,Let's look at the type of deducing.earlier aboveplayscript实现中,is to assume that the types on both sides of the operator are integers,And to do the casting.This is of course insufficient in practical applications,So I can at compile time to determine the type of the expression to.Such as the code below is inRefResolve.java中,推导表达式的类型:

case PlayScriptParser.ADD:
    if (type1 == PrimitiveType.String || 
        type2 == PrimitiveType.String){
        type = PrimitiveType.String;
    }
    else if (type1 instanceof PrimitiveType && 
             type2 instanceof PrimitiveType){
        //类型“向上”对齐,比如一个int和一个float,取float
        type = PrimitiveType.getUpperType(type1,type2);
    }else{
        at.log("operand should be PrimitiveType for additive operation", ctx);
    }
    break;

这段代码提到,If the operand has one side on both sides, the data type isString类型的,The entire expression isString类型的.If it is other basic types of,It is necessary to perform type conversion according to certain rules,And determine the type of operation result.比如+No. On one side,double类型的,另一边是int类型的,那就要把inttype conversion intodouble型的,Finally the calculation results isdouble类型的.To do the type of derivation after,simplifies the run-time calculation,No need to do type judgment at runtime:

private Object add(Object obj1, Object obj2, Type targetType) {
    Object rtn = null;
    if (targetType == PrimitiveType.String) {
        rtn = String.valueOf(obj1) + 
              String.valueOf(obj2);
    } else if (targetType == PrimitiveType.Integer) {
        rtn = ((Number)obj1).intValue() + 
              ((Number)obj2).intValue();
    } else if (targetType == PrimitiveType.Float) {
        rtn = ((Number)obj1).floatValue()+
              ((Number)obj2).floatValue(); 
    } 
    ...
    return rtn;
}

Through this type derivation example,can also lead toS属性(Synthesized Attribute).If an attribute can be inferred from subordinate nodes,This property is calledS属性,Literally means comprehensive properties,就是ASTFrom the lower attribute induction、Comprehensive out at the corresponding level properties.更准确地说,is determined by subordinate nodes and itself,如下图所示:

与S属性相对应的是I属性(Inherited Attribute),Also is the inheritance property,即ASTThe attributes of a node in the、兄弟节点和它自身来决定的,比如:

变量a的类型是int,这个很直观,Because variable declaration statement has been pointed outa的类型,But the type is not deduced from the subordinate node,but derived from sibling nodes.在PlayScript.g4中,The relevant syntax for variable declaration is as follows:

variableDeclarators
    : typeType variableDeclarator (',' variableDeclarator)*
    ;

variableDeclarator
    : variableDeclaratorId ('=' variableInitializer)?
    ;

variableDeclaratorId
    : IDENTIFIER ('[' ']')*
    ;

typeType
    : (classOrInterfaceType| functionType | primitiveType) ('[' ']')*
;

把int a;Such a simple variable declaration statement parses intoAST,To form a tree, there are two branches of the tree,如下所示:

The left branch of this tree can deduce types from bottom to top,So the type attribute isS属性.While the right branch must from the root node(也就是 variableDeclarators)Inherited down type attribute,所以对a 这个节点来说,It is the type attribute of theI属性.Many modern languages ​​will support automatic type deduction,例如GoThe language has two ways of declaring variables:

var a int = 10  //第一种
a := 10         //第二种

第一种方式,aThe type of is explicitly declared;第二种方式,aThe type of is deduced from the expression on the right.从生成AST 中,It can be seen that they have experienced a synthesis from bottom to top,The process of inheritance from top to bottom:

38. Said these types derived,再看看类型检查.Type checking mainly occurs in several scenarios:

(1)赋值语句(Checks that the types on the left and right of an assignment match).

(2)变量声明语句(Because the variable declaration statement will also have an initialization part,So also need to type).

(3)函数传参(调用函数的时候,The incoming parameters must meet the requirements of the formal parameters).

(4)函数返回值(When returning a value from a function,To comply with the provisions of the function return value).

Type checking and a feature:以赋值语句为例,The type on the left isI属性,From the statement to get;The type on the right isS属性,Come out in the bottom-up comprehensive.When the types on the left and right meet,Check whether the two matching,The assigned variable must satisfy the type requirement on the left.If the horse natural no problem,If you don't match exactly does not necessarily an error right away,But to see if the type conversion can be done.比如,The general language in the treatment of mixed integer and floating point arithmetic,Can automatically transform.像JavaScript和SQLEven when arithmetic,Automatically convert the string into digital.在MySQLRun in the following statement,会得到3,它自动将’2’转换成了数字:

select 1 + '2';

This process is actually a risk,It's like opening a backdoor in a strongly typed language,Bypasses or partially bypasses the type checking capabilities of the compiler.In the scene where the parent class is converted into a child class,The best the compiler can check whether there inheritance relationships between these two classes,If no inheritance relationships,This, of course, can check out the mistake,To stop this kind of transformation.But a base class may have many subclasses,Specific this transformation is right,Errors can only be detected at runtime.

39. Lexical analysis and syntax analysis phase,processing is context-free.but only with context-free processing,It is impossible to complete a powerful language.Such as the first to declare variables with,This is a typical context-sensitive situation,Certainly this situation cannot be expressed in a context-free grammar,So the parsing stage cannot handle this problem,Can only be processed during the semantic analysis phase.The essence of semantic analysis,就是To carry out processing of contextual situation.前面讲到的作用域,is a context-sensitive situation,Because if the scope is different,Can use the variable are also different.The type system is also a context-sensitive case,Type and type checking should be derived based on the context of relatedAST节点.

First, let's talk about the scene of reference resolution.In the program using variables、函数、class and other symbols,Need to know who they mean,to correspond to where they are defined.In the following example when using variablesa时,Need to know that it is a global variablea,还是fun()函数中的本地变量a.因为May have the same name in different scope of a variable,So have to find the right one.这个过程,Can call reference resolution.如下所示:

/*
scope.c
测试作用域
 */
#include <stdio.h>

int a = 1;

void fun()
{
    a = 2;     //This refers to the global variablea
    int a = 3; //Declare a local variable
    int b = a; //这个aRefers to a local variable
    printf("in func: a=%d b=%d \n", a, b);
}

Function reference digestion is more complex than a variable references to eliminate some.It is not only to the comparison function name,They compare parameters and return values(Can call the function prototype,Or function type).Earlier when promoting a function to first-class citizen,mention function type(FunctionType)的概念.两个函数的类型相同,需要返回值、参数个数、The type of each parameter can be matched.

在面向对象语言中,Function reference resolution is also complicated.when a parameter requires an object,It is also possible to provide an instance of its subclass in the program,That is a subclass can be used in all of the parent class need,例如下面的代码:

class MyClass1{}      //父类
class MyClass2 extends MyClass1{}  //子类

MyClass1 obj1;
MyClass2 obj2;

function fun(MyClass1 obj){}       //Parameters need to be the parent class instance

fun(obj2);   //Provide a subclass instance

在C++中,The resolution of references and more complicated.It will also consider whether an argument can be automatic conversion of forming and the required type,Such as a needdouble类型的地方,pass it aint也是可以的.命名空间It is also a factor to consider when doing citation digestion..像Java、C++All support namespace.If a namespace is introduced earlier in the code,You can directly refer to the symbols inside,Otherwise need to refer to a namespace,如下所示:

play.PlayScriptCompiler.Compile()   //Java语言
play::PlayScriptCompiler.Compile()  //C++语言

While doing reference resolution may yield several results:

(1)Parsing out the accurate reference relationship.

(2)重复定义(In a new symbol,Found that this symbol has already been defined).

(3)引用失败(Can't find the definition of a symbol).

(4)If two different name has the same name in the space symbol,Programmers need to explicitly specify.

40. During the development of a compiler or interpreter,There are bound to be problems with lvalues ​​and rvalues.比如在playscript的ASTEvaluate.java中,在visitPrimaryNode variables can be evaluated.If it is the statement belowa则没有问题,把aThe value of the variable out good:

a + 3;

But if it is an assignment statement,a在等号的左边,怎么对aWhat about evaluating?如下所示:

a = 3;

假设aVariables of the original value is4,If you still take out its value, it becomes3=4,This will become meaningless.So this time can'ta的值取出来,should be taken outa的地址,或者说a的引用,And then with the assignment operation3This value is written toa的内存地址,Then say out is thata的左值(L-value).An lvalue is in at the earliestC语言中提出的,Usually appears on the left side of the expression,Such as the left side of the assignment statement.Left to take the address of the variable as a(Or variable reference),After getting the address, you can write the new value into it.

Corresponding to the left value is右值(R-value),The right value is called,而不是地址.在上面这两种情况下,变量a在ASTIs corresponding to the same node,也就是primary节点.This node is evaluated when the return value left or right value?This should use context to analysis and processing,如果这个primaryNodes exist in the following situations,It needs to take the left value:

(1)On the left side of an assignment expression;

(2)Variables in variable declaration statements with initialization;

(3)When assigning a value to a function parameter;

(4)一元操作符:++和–.

(5)Other need to change the operation of variable content.

Of course, not all expression,Can generate a qualified lvalue.That is to say, it appears on the left-hand side of an assignment statement.,Must be able to obtain an lvalue expressions.比如一个变量、A class attribute can be.But if it is a constant,或者2+3Such expressions on the left side of the assignment of symbols,那就不行.所以,Judging expressions can produce a qualified lvalue and semantics is a job.

40. Handle contextual situation,经常用属性计算的方法.Attribute calculation is to do context analysis,Or an algorithm for semantic analysis.According to the attribute evaluation perspective,Various semantic analysis problems that have been dealt with before,都可以看做是对ASTA calculation is made of a single attribute node.Such as lvalue scenario for oprimary节点,It needs to calculate properties include:

(1)It is the variable definitions of which(This is reference to the definition of the variable Symbol).

(2)它的类型是什么.

(3)它的作用域是什么.

(4)This node is evaluated when,Whether I should return the left value、Whether correctly returns an lvalue.

(5)它的值是什么.

A properties calculation point of view,Evaluate an expression or run a script,只是去计算AST节点的Value属性.Attribute calculation requires attribute grammar.在词法、语法分析阶段,Regular grammar and context-free grammar are mentioned respectively,In semantic analysis phase is to understand属性文法(Attribute Grammar).It is on the basis of context-free grammar did some enhancement,To calculate the attribute value.Below is a context-free grammar expression of addition and multiplication example:

add → add + mul
add → mul
mul → mul * primary
mul → primary
primary → "(" add ")"
primary → integer

然后对valueThe property grammar for property evaluation is as follows:

add1 → add1 + mul [ add1.value = add2.value + mul.value ]
add → mul [ add.value = mul.value ]
mul1 → mul2 * primary [ mul1.value = mul2.value * primary.value ]
mul → primary [ mul.value = primary.value ]
primary → "(" add ")" [ primary.value =  add.value ]
primary → integer [ primary.value = strToInt(integer.str) ]

Using the attribute grammar rules can be defined,Then use the tool to automatically realize the calculation of the attribute.例如解析表达式2+3时得到一个AST,How do you know when it's doing an addition??Because you can on the basis of the grammatical rules set attribute grammar,in the process of parsing the grammar or formingAST之后,Attribute calculation can be done according to the rules of attribute grammar.Therefore the characteristics of the attribute evaluation is:It will be based on the rules of grammar,Add some rules related to semantic processing.所以,The definition of this semantic rule is also called语法制导的定义(Syntax directed definition,SDD),If into computing action,就叫做语法制导的翻译(Syntax directed translation,SDT).

The properties, it can be together with the syntax analysis process,Can also be carried out with syntax analysis done again.These two stages are not necessarily completely separated,Even sometimes do some attribute evaluation in the syntax analysis,Then the calculation results are fed back to the logic of parsing,Helps parsing perform better.So when parsing grammar,How do attribute evaluation at the same time?The process of parsing grammar is to build up step by stepAST的过程.在这个过程中,To calculate a node properties rely on other nodes may be created.Such as the fall in recursive algorithm,After a node is established,All of its child nodes must be set up,所以SAttributes can be calculated.同时,Because parsing is done from left to right,The sibling nodes on its left are also established..

If a property except may depend on the child nodes to calculate the,Only rely on the left of the brothers node,Don't rely on the right of the,This property is calledL属性.它比SThe range of attribute more,Contains the partI属性.Due to the common syntax analysis algorithm is from left to right are,So it is very suitable for parsing grammar,一边计算L属性.比如C和JavaAnalyze the types of language,都可以用LProperties of computing to realize.Because of the type of the two languages or come from down to up comprehensive,属于S属性.Variable declarations are either doing,Be determined by the variables in the declarations,Types of nodes on the left side of the variable,如下所示:

2+3;    //Expression type is int
float a;  //a的类型是浮点型

41. 但Go和ScalaThe type declaration for is placed after the variable,This means that the type node must be on the right,那就不符合Lattribute grammar:

var a int = 10

但没关系,It is not necessary to all properties in syntax analysis phase calculated,Wait until after the syntax analysis ofAST遍历一下就好了.At this time all nodes have the,Calculate attribute it is.In the above customplayscriptIn the language adopted this strategy.In order to let the algorithm more clear,playscriptSplit the semantic analysis process into several tasks,对ASTdid multiple traversals:

(1)第1遍:The type and scope resolution.put custom class、The tree of functions and scopes are analyzed.这么做的好处是,Can be used in the first statement in the.比如声明一个Mammal对象,而MammalThe class definition comes later;在定义一个类时,For a class of members also use before the statement would happen,The type resolution scan it again,can easily support this feature.When writing attribute evaluation algorithm,The order of computation is probably the most important issue.Because some properties calculation may rely on other nodes attributes to calculate.上面讨论的S属性、I属性和L属性,Are considering calculation order.As used in the former statement in this situation,Special treatment should more.

(2)第 2 遍:type of digestion.Eliminate all occurrences of references to types,比如变量声明、函数参数声明、class inheritance, etc..After digestion later on Mammal m;这样的语句,clearly knowm的类型.这实际上是对IProperty type calculation.

(3)第 3 遍:Cited Digestion andSThe type of the property is derived.At this time of all the variables、函数调用,All definitions associated with it,and completes all type calculations.

(4)第 4 遍:做类型检查.For example, when the types on the left and right sides of an assignment statement are incompatible,就可以报错.

(5)第 5 遍:Do some semantic legitimacy checks.比如breakCan only appear in the loop,If a function declares a return value,就一定要有return语句等等.

The result of semantic analysis kept inAnnotatedTree.java类里,means a syntax tree with attributes annotated.注意,These properties on the data structure,并不一定是AST节点的属性,可以借助Map等数据结构存储,Just on the concept of these properties or mark on the tree node.AnnotatedTree类的结构如下:

public class AnnotatedTree {
    // AST
    protected ParseTree ast = null;

    // All types of parsing out,包括类和函数
    protected List<Type> types = new LinkedList<Type>();

    // AST节点对应的Symbol
    protected Map<ParserRuleContext, Symbol> symbolOfNode = new HashMap<ParserRuleContext, Symbol>();

    // AST节点对应的Scope,如for、A function call will start a newScope
    protected Map<ParserRuleContext, Scope> node2Scope = new HashMap<ParserRuleContext, Scope>();

    // The inferred type for each node
    protected Map<ParserRuleContext, Type> typeOfNode = new HashMap<ParserRuleContext, Type>();
    
    // 命名空间,Scope of the root node
    NameSpace nameSpace = null;  

    ...  
}

因此,Semantic analysis is the essence of handling of contextual situation.Understand the reference resolution、An lvalue and rvalue scene,Can increase intuitive understanding of semantic analysis.Grasp the calculation and attribute grammar,You can use more formal、Clearer algorithms to accomplish the task of semantic analysis.This stage of semantic analysis is very important.Because lexical and grammar have very fixed routines,The realization of the even can draw.但The core of language design is semantic,In particular, to make the semantics fit the problem being solved.Among various programming language is the essential difference between semantic aspect,Lexical differences are minimal,Grammar comes next,The most semantic difference.

九、继承和多态:Dynamic Characteristics of Object-Oriented Runtime

42. Inheritance and polymorphism introduce new concepts to the type system,就是子类型.The type of contact is often in front of the parallel relationship,One is an integer and the other is a string,都是平等的.And now a type can be another type of subtype,Like a sheep and belong to mammals.这会导致Not all types can be accurately calculated at compile time,Calls to methods and properties to do the resolution of completely correct(或者说绑定).This part of the work should be left to the runtime to do,So object-oriented programming will also have very good advantage,Because it can lead to polymorphism.

To gain a deep understanding of object-oriented features,You must learn the elements of the subtype and run-time mechanism.Next, understand inheritance and polymorphism from the perspective of type system,Then see what semantic analysis needs to be done at compile time,Another look at the runtime features of inheritance and polymorphism.Inheritance means subclass of a class,Automatically have the properties and methods of the parent class,Unless the parent class declaration for private.所以Inherit the power of,reuse.

Polymorphism means different subclasses of the same class,Different actions are performed when the same method is called.This is because each subclasses can override out a method of the parent,To provide a different implementation.The following code demonstrates the features of inheritance and polymorphism,a的speak()方法和bspeak() The method will print out the call of the cow and the call of the sheep respectively,调用的是子类的方法,而不是父类的方法:

/**
mammal.play Demonstrate the object-oriented programming:继承和多态.
*/
class Mammal{
    int weight = 20;  
    boolean canSpeak(){
        return true;
    }

    void speak(){
        println("mammal speaking...");
    }
}

class Cow extends Mammal{
    void speak(){
        println("moo~~ moo~~");
    }
}

class Sheep extends Mammal{
    void speak(){
        println("mee~~ mee~~");
        println("My weight is: " + weight); //weightThe scope of the cover subclass
    }
}

//Assign instance of subclass to variable of superclass
Mammal a = Cow();
Mammal b = Sheep();

//canSpeak()方法是继承的
println("a.canSpeak() : " + a.canSpeak());
println("b.canSpeak() : " + b.canSpeak());

//The following two calls will be different,In the run-time dynamic binding method
a.speak();  //Print the cattle call
b.speak();  //print sheep call

所以Polymorphism of the power of,Is that although each subclass is different,But still can be in accordance with the unified way to use them,做到求同存异.面向对象编程时,You can create different subclasses of a class,We can realize the function of some personalized;Write programs at a higher level of abstraction,Don't go to tube concrete difference.If the above conclusion abstract into general type theory,就是子类型(subtype).it can relax type checking,Leading to polymorphic.Can be roughly of the inheritance of the object-oriented as subtyping a reflection,The result of this is the ability to replace the parent class with a subclass,Leading to polymorphic.There are two subtypes way:

(1)一种就是像Java和C++Need to explicitly declare what class inherits,Or what implements the interface.这种叫做名义子类型(Nominal Subtyping).

(2)另一种是结构化子类型(Structural Subtyping),aka duck type(Duck Type).Is a class does not need to explicitly say what type of person he is,as long as it implements all methods of a type,It belongs to the type.

43. After understanding the inheritance and polymorphism,看看在Compile time how to make semantic analysis on characteristics of inheritance, and polymorphism of.First from the point of view of type processing,To identify the new type:Mammal、Cow和Sheep,You can then use them to declare variables.The second to set up the correct scope,从作用域的角度来看,一个类的属性(或者说成员变量),It is possible to specify whether it can be accessed by subclasses.以Java为例,除了声明为privateother than the properties of,Other attributes are visible in the subclass.So the scope of the parent class attribute,Can be in the form of a tree covered various subclasses:

The third is to do type reference resolution for variables and functions.Also is to analyzea和b这两个变量的类型.注意,In the previous code is to useMammalTo declare the two variables of.According to the algorithm derived from the type,a和b都是Mammal,这是个IThe process of attribute evaluation.也就是说在编译期,Unable to know the variable assigned object which is exact subtypes,Just know that when you declare variables they are of mammal type,As to cattle or sheep,就不清楚了.

But in the previous codea和b的确是以Cow和Sheepobject created,Can't use type is?The language does have automatic type inference,但是有限制条件.比如,The strong typing mechanism requires that the type of the variable is determined once,In the operation process can't change,所以要让a和bBe able to point to other objects,And the type unchanged.从这个角度出发,a和bThe type of only the parent classMammal.所以说,Compile time unable to know the real type variable,May only know its parent type.This can lead to can't correctly willspeak()Method to do reference digestion.The correct resolution is to point to theCow和Sheep的speak方法,And this can only be solved at runtime.

所以接下来,Discuss how toImplement dynamic binding of methods at runtime.At run time to knowa和bWhich object do these two variables refer to?,The object holds the real type information.具体来说playscript中,ClassObject的typeAttribute will point to the rightClass,This type information when creating objects to be correct assignment,如下图所示:

When calling properties and methods of a class,Can be gained according to the runtime、Determined type information for dynamic binding.The following code is from this level,The realization of the step by step to find a method,If both this class and the parent class have this method,Then this class will overwrite the parent class.,这样就实现了多态

protected Function getFunction(String name, List<Type> paramTypes){
    //During the search this way
    Function rtn = super.getFunction(name, paramTypes);  //TODO 是否要检查visibility

    //If in the corresponding can't find,Then the parent class of recursive search
    if (rtn == null && parentClass != null){
        rtn = parentClass.getFunction(name,paramTypes);
    }

    return rtn;
}

If the current class does not implement this method,It can directly reuse the implementation in the parent class of a certain level,This is actually the principle of the inheritance mechanism at runtime..这里延伸一下,Type information can be obtained at runtime,这种机制就叫做运行时类型信息(Run Time Type Information, RTTI).C++、JavaHas this mechanism,比如Java的instanceof操作,Can detect whether an object instance of a class or subclass.

Assembly language is a type of,Therefore, after the general high-level language is compiled into the target language,These high-level semantics is lost.If you want to get type information at runtime,need special implementationRTTI的功能,This costs additional storage overhead and computational overhead.As in the customplayscript中,要在ClassObjectSpecifically take out a field to storetype信息.

44. Now you know how to get type information at runtime,Implementation method of dynamic binding.Then look at run-time object initialization mechanism step by step,即Inheritance cases object instantiated.在存在继承关系的情况下,When creating objects not only to initialize your property variable at this level,It is also necessary to initialize the attribute variables of the parent classes at all levels..such as instantiatingCow时,还要对Mammal的成员变量weight做初始化.所以要修改playscriptThe object instantiation code,Starting from the top of the ancestors of the ancestors of all layers of initialization:

//From the parent class to subclass execution layer upon layer the default initialization method,That is, the initialization method with no parameters
protected ClassObject createAndInitClassObject(Class theClass) {
    ClassObject obj = new ClassObject();
    obj.type = theClass;

    Stack<Class> ancestorChain = new Stack<Class>();

    // Execute the default initialization method from top to bottom
    ancestorChain.push(theClass);
    while (theClass.getParentClass() != null) {
        ancestorChain.push(theClass.getParentClass());
        theClass = theClass.getParentClass();
    }

    // To perform the default initialization method
    StackFrame frame = new StackFrame(obj);
    pushStack(frame);
    while (ancestorChain.size() > 0) {
        Class c = ancestorChain.pop();
        defaultObjectInit(c, obj);
    }
    popStack();

    return obj;
}

In the process of initialization step by step,To perform default member variable initialization first,That is, the initialization part of the variable declaration,Then call the constructor of this level.If you do not explicitly specify which constructor,will execute the constructor without parameters.不过有时候,Subclasses can optionally call the superclass one construction method,就像JavaCan in the constructor throughsuper()To explicitly call a specific constructor of the parent class.

Next, deepen the pairing through an example programthis和super机制的理解.比如在下面的ThisSuperTest.Java代码中,Mammal和它的子类Cow都有speak()方法.如果要创建一个Cow对象,会调用Mammal的构造方法Mammal(int weight),And in this constructor callthis.speak()方法,是Mammal的还是Cow的呢:

package play;

public class ThisSuperTest {

    public static void main(String args[]){
        //创建Cow对象的时候,会在Mammal的构造方法里调用this.reportWeight(),What will be displayed here
        Cow cow = new Cow();

        System.out.println();

        //这里调用,会显示什么
        cow.speak();
    }
}

class Mammal{
    int weight;

    Mammal(){
        System.out.println("Mammal() called");
        this.weight = 100;
    }

    Mammal(int weight){
        this();   //call another constructor of its own
        System.out.println("Mammal(int weight) called");
        this.weight = weight;

        //Access properties here,是自己的weight
        System.out.println("this.weight in Mammal : " + this.weight);

        //这里的speak()who is calling,What will show numerical
        this.speak();
    }

    void speak(){
        System.out.println("Mammal's weight is : " + this.weight);
    }
}


class Cow extends Mammal{
    int weight = 300;

    Cow(){
        super(200);   //调用父类的构造函数
    }

    void speak(){
        System.out.println("Cow's weight is : " + this.weight);
        System.out.println("super.weight is : " + super.weight);
    }
}

运行结果如下:

Mammal() called
Mammal(int weight) called
this.weight in Mammal : 200
Cow's weight is : 0
super.weight is : 200

Cow's weight is : 300
super.weight is : 200

答案是Cow的speak()方法,而不是Mammal的.The code is not callthis.speak()吗?怎么这个this不是 Mammal,Turned into a subclass of itCow呢?In fact, in this code,this用在了三个地方:

(1)this.weightIs access to their member variables,Because the scope of member variables is the class itself,以及子类.

(2)this()is another constructor that calls itself,Because it is a method to construct,Do their own initialization must be.换句话说,Constructors do not have polymorphism problems.

(3)this.speak()Is the method of calling a regular,The polymorphism would still work.Run time according to the actual type of objects,来绑定到Cow的speak()方法上.

只不过,在Mammal的构造方法中调用this.speak()时,虽然访问的是Cow的speak()方法,打印的是Cow中定义的weightA member variable value is0,Rather than a member variable declarations“int weight = 300;”的300,为什么呢?要想知道这个答案,Need to understand multiple inheritance cases object initialization process.Mammal的构造方法中调用speak()时,CowThe initialization process has not yet started,所以“int weight = 300;”还没有执行,Cow的weightAttributes or the default value0.

讨论完this,super就比较简单了,its semanticsthisSimple won't appear ambiguity.superThe call is also divided into three kinds of situation:

(1)super.weight.This is for calling the parent class or higher ancestorweight属性,而不是Cowat this levelweight属性.Not necessarily is a direct parent,In the class can also be a grandfather.Overriding relationship according to variable scope,只要是比CowThis high level will do.

(2)super(200).This is to call the superclass constructor,Must be a direct of the parent class.

(3)super.speak().Logic is the same as the access attributes,is calling the parent class or higher ancestorspeak()方法.

copyright notice
author[Book Recalls Jiangnan],Please bring the original link to reprint, thank you.
https://en.qdmana.com/2022/266/202209230622345012.html

Random recommended