├── .gitignore ├── LICENSE ├── README.md ├── compiler-course-outline.md ├── compiler-knowledge-point.md └── lang-spec ├── decaf-with-class-spec.md ├── decaf-without-calss-spec-mit.pdf ├── monkey-spec.md └── tiger-spec.md /.gitignore: -------------------------------------------------------------------------------- 1 | # Prerequisites 2 | *.d 3 | 4 | # Object files 5 | *.o 6 | *.ko 7 | *.obj 8 | *.elf 9 | 10 | # Linker output 11 | *.ilk 12 | *.map 13 | *.exp 14 | 15 | # Precompiled Headers 16 | *.gch 17 | *.pch 18 | 19 | # Libraries 20 | *.lib 21 | *.a 22 | *.la 23 | *.lo 24 | 25 | # Shared objects (inc. Windows DLLs) 26 | *.dll 27 | *.so 28 | *.so.* 29 | *.dylib 30 | 31 | # Executables 32 | *.exe 33 | *.out 34 | *.app 35 | *.i*86 36 | *.x86_64 37 | *.hex 38 | 39 | # Debug files 40 | *.dSYM/ 41 | *.su 42 | *.idb 43 | *.pdb 44 | 45 | # Kernel Module Compile Results 46 | *.mod* 47 | *.cmd 48 | .tmp_versions/ 49 | modules.order 50 | Module.symvers 51 | Mkfile.old 52 | dkms.conf 53 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Creative Commons Legal Code 2 | 3 | CC0 1.0 Universal 4 | 5 | CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE 6 | LEGAL SERVICES. DISTRIBUTION OF THIS DOCUMENT DOES NOT CREATE AN 7 | ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS 8 | INFORMATION ON AN "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES 9 | REGARDING THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS 10 | PROVIDED HEREUNDER, AND DISCLAIMS LIABILITY FOR DAMAGES RESULTING FROM 11 | THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS PROVIDED 12 | HEREUNDER. 13 | 14 | Statement of Purpose 15 | 16 | The laws of most jurisdictions throughout the world automatically confer 17 | exclusive Copyright and Related Rights (defined below) upon the creator 18 | and subsequent owner(s) (each and all, an "owner") of an original work of 19 | authorship and/or a database (each, a "Work"). 20 | 21 | Certain owners wish to permanently relinquish those rights to a Work for 22 | the purpose of contributing to a commons of creative, cultural and 23 | scientific works ("Commons") that the public can reliably and without fear 24 | of later claims of infringement build upon, modify, incorporate in other 25 | works, reuse and redistribute as freely as possible in any form whatsoever 26 | and for any purposes, including without limitation commercial purposes. 27 | These owners may contribute to the Commons to promote the ideal of a free 28 | culture and the further production of creative, cultural and scientific 29 | works, or to gain reputation or greater distribution for their Work in 30 | part through the use and efforts of others. 31 | 32 | For these and/or other purposes and motivations, and without any 33 | expectation of additional consideration or compensation, the person 34 | associating CC0 with a Work (the "Affirmer"), to the extent that he or she 35 | is an owner of Copyright and Related Rights in the Work, voluntarily 36 | elects to apply CC0 to the Work and publicly distribute the Work under its 37 | terms, with knowledge of his or her Copyright and Related Rights in the 38 | Work and the meaning and intended legal effect of CC0 on those rights. 39 | 40 | 1. Copyright and Related Rights. A Work made available under CC0 may be 41 | protected by copyright and related or neighboring rights ("Copyright and 42 | Related Rights"). Copyright and Related Rights include, but are not 43 | limited to, the following: 44 | 45 | i. the right to reproduce, adapt, distribute, perform, display, 46 | communicate, and translate a Work; 47 | ii. moral rights retained by the original author(s) and/or performer(s); 48 | iii. publicity and privacy rights pertaining to a person's image or 49 | likeness depicted in a Work; 50 | iv. rights protecting against unfair competition in regards to a Work, 51 | subject to the limitations in paragraph 4(a), below; 52 | v. rights protecting the extraction, dissemination, use and reuse of data 53 | in a Work; 54 | vi. database rights (such as those arising under Directive 96/9/EC of the 55 | European Parliament and of the Council of 11 March 1996 on the legal 56 | protection of databases, and under any national implementation 57 | thereof, including any amended or successor version of such 58 | directive); and 59 | vii. other similar, equivalent or corresponding rights throughout the 60 | world based on applicable law or treaty, and any national 61 | implementations thereof. 62 | 63 | 2. Waiver. To the greatest extent permitted by, but not in contravention 64 | of, applicable law, Affirmer hereby overtly, fully, permanently, 65 | irrevocably and unconditionally waives, abandons, and surrenders all of 66 | Affirmer's Copyright and Related Rights and associated claims and causes 67 | of action, whether now known or unknown (including existing as well as 68 | future claims and causes of action), in the Work (i) in all territories 69 | worldwide, (ii) for the maximum duration provided by applicable law or 70 | treaty (including future time extensions), (iii) in any current or future 71 | medium and for any number of copies, and (iv) for any purpose whatsoever, 72 | including without limitation commercial, advertising or promotional 73 | purposes (the "Waiver"). Affirmer makes the Waiver for the benefit of each 74 | member of the public at large and to the detriment of Affirmer's heirs and 75 | successors, fully intending that such Waiver shall not be subject to 76 | revocation, rescission, cancellation, termination, or any other legal or 77 | equitable action to disrupt the quiet enjoyment of the Work by the public 78 | as contemplated by Affirmer's express Statement of Purpose. 79 | 80 | 3. Public License Fallback. Should any part of the Waiver for any reason 81 | be judged legally invalid or ineffective under applicable law, then the 82 | Waiver shall be preserved to the maximum extent permitted taking into 83 | account Affirmer's express Statement of Purpose. In addition, to the 84 | extent the Waiver is so judged Affirmer hereby grants to each affected 85 | person a royalty-free, non transferable, non sublicensable, non exclusive, 86 | irrevocable and unconditional license to exercise Affirmer's Copyright and 87 | Related Rights in the Work (i) in all territories worldwide, (ii) for the 88 | maximum duration provided by applicable law or treaty (including future 89 | time extensions), (iii) in any current or future medium and for any number 90 | of copies, and (iv) for any purpose whatsoever, including without 91 | limitation commercial, advertising or promotional purposes (the 92 | "License"). The License shall be deemed effective as of the date CC0 was 93 | applied by Affirmer to the Work. Should any part of the License for any 94 | reason be judged legally invalid or ineffective under applicable law, such 95 | partial invalidity or ineffectiveness shall not invalidate the remainder 96 | of the License, and in such case Affirmer hereby affirms that he or she 97 | will not (i) exercise any of his or her remaining Copyright and Related 98 | Rights in the Work or (ii) assert any associated claims and causes of 99 | action with respect to the Work, in either case contrary to Affirmer's 100 | express Statement of Purpose. 101 | 102 | 4. Limitations and Disclaimers. 103 | 104 | a. No trademark or patent rights held by Affirmer are waived, abandoned, 105 | surrendered, licensed or otherwise affected by this document. 106 | b. Affirmer offers the Work as-is and makes no representations or 107 | warranties of any kind concerning the Work, express, implied, 108 | statutory or otherwise, including without limitation warranties of 109 | title, merchantability, fitness for a particular purpose, non 110 | infringement, or the absence of latent or other defects, accuracy, or 111 | the present or absence of errors, whether or not discoverable, all to 112 | the greatest extent permissible under applicable law. 113 | c. Affirmer disclaims responsibility for clearing rights of other persons 114 | that may apply to the Work or any use thereof, including without 115 | limitation any person's Copyright and Related Rights in the Work. 116 | Further, Affirmer disclaims responsibility for obtaining any necessary 117 | consents, permissions or other rights required for any use of the 118 | Work. 119 | d. Affirmer understands and acknowledges that Creative Commons is not a 120 | party to this document and has no duty or obligation with respect to 121 | this CC0 or use of the Work. 122 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # compiler-lectures -------------------------------------------------------------------------------- /compiler-course-outline.md: -------------------------------------------------------------------------------- 1 | # 教学日历(教学大纲)(32学时) 2 | - [Ver1](https://github.com/learncompiler/compiler-lectures/blob/edc78f650320a18d3d479b4ee87bc4c74defe04e/compiler-course-outline.md) 3 | - Ver2 4 | 5 | ## 课程简介 6 | - 课程名称 编译原理 Principles and Practice of Compiler Construction 7 | - 课程编号 30240382 8 | - 课程类型 专业核心课 9 | - 学分 2 10 | - 学时 32 11 | - 开课学期 大三秋季学期 12 | - 教学方式 讲授为主+实验 13 | - 授课语言 中文 14 | - 考核方式 考试 15 | - 成绩评定 书面作业+实验+期末考试 16 | - 教材 无固定教材 17 | 18 | ### 课程概述 19 | 本课程主要内容包括:编译程序/系统概述,形式语言、文法和自动机的基础知识,词法分析,语法分析,语法制导的语义处理基础,语义分析和中间代码生成,符号表组织,运行时存储组织,代码优化和目标代码生成。 20 | 21 | ### 先修课程 22 | - 程序设计,数据结构,形式语言与自动机 23 | - 至少掌握一种课程实验用的编程语言(C/C++, Java, Scala, Rust, Go) 24 | 25 | ### CC200X相关知识领域 26 | CE—ALG/CS—AL,CE—CAO/CS—AR,CE—OPS/CS—OS,CE—PRF/CS—PF,CE—SWE/CS—SE,CE—DSC/CS—DS,CS—PL,CS—IS,SE—CMP,SE—FND,SE—MAA,SE—VAV,SE—QUA. 27 | 28 | ### 参考资料 29 | - Aho, Sethi, Lam, and Ullman (龙书), Compilers: Principles, Techniques, and Tools . 第一版(1986)第二版(2007),Addison Wesley. 30 | - Andrew W.Appel (虎书),Modern Compiler Implementation in C. Andrew W.Appel,人民邮电出版社影印,2005 31 | 32 | ### 课程定位 33 | 编译程序/系统在计算机科学技术的发展历史中发挥了巨大作用,是计算机系统的核心支撑软件。编译原理一直以来是国内外大学计算机相关专业的重要课程,其知识结构贯穿程序设计语言、系统环境以及体系结构,其理论基础是联系计算机科学和计算机系统的典范。 34 | 本课程是计算机专业核心课,主要讲授编译程序/系统构造的基本原理和技术,为学生深入学习计算机系统相关的专业知识以及今后从事科学研究或技术开发工作打下扎实的基础。 35 | 36 | ### 教学要求 37 | 本课程的教学目的是系统掌握编译程序/系统的设计原理以及实现技术。要求学生: 38 | 1. 深入理解编译程序/系统的基本构造原理; 39 | 1. 掌握常用语言机制的实现技术; 40 | 1. 经历开发一个小型编译程序的主要阶段; 41 | 1. 具有学习和使用特定编译构造工具的能力; 42 | 1. 会将所学的通用方法和技术应用于类似软件的设计和实现中; 43 | 1. 具备综合运用知识开发具有一定规模的软件系统的能力。 44 | 45 | ### 教学特色 46 | 本课程有以下特色: 47 | 1. 课程涉及三个方面的训练,即原理、技术与工具。课堂讲授和课后训练内容互补:基础原理以课堂讲授为主,实现技术采取以课堂讲解和课后实验相结合的方式,相关工具的使用由同学课外自己掌握。 48 | 1. 课程实验为实现一个小型面向对象语言的编译程序;分几个阶段,但各阶段相互关联融为一体;保持一定的强度,旨在激发学生兴趣,强化实践动手能力。 49 | 1. 注意计算机系统相关课程之间的相关性,旨在培养学生学习专业知识的大局观。 50 | 51 | # 教学内容 52 | 53 | ## 第1讲 课程概述 (2课时) 54 | ### 1.1 基础概念; 55 | ### 1.2 逻辑结构; 56 | ### 1.3 组织方式; 57 | ### 1.4 伙伴程序; 58 | ### 1.5 生成环境; 59 | 60 | ## 第2讲 实验项目介绍 (4课时,穿插介绍) 61 | ### 2.1 项目框架的总体结构; 62 | ### 2.2 实验内容; 63 | ### 2.3 实验环境; 64 | ### 2.4 实验安排; 65 | ### 2.5 考核方案; 66 | 67 | ## 第3讲 文法/正规式/有限自动机 ─ 基础知识(1课时) 68 | ### 3.1 形式语言概念; 69 | ### 3.2 上下文无关文法及语言; 70 | ### 3.3 正规语言及其描述; 71 | 72 | ## 第4讲 词法分析 (1课时) 73 | ### 4.1 词法分析概述; 74 | ### 4.2 词法分析程序的设计与实现; 75 | ### 4.3 词法分析程序的自动构造; 76 | 77 | ## 第5讲 符号表(1课时) 78 | ### 5.1 符号表的作用; 79 | ### 5.2 符号表的常见属性; 80 | ### 5.3 符号表上的操作; 81 | ### 5.4 符号表的组织; 82 | ### 5.5 符号表与作用域; 83 | 84 | ## 第6讲 自顶向下语法分析(2.5课时) 85 | ### 6.1 自顶向下分析思想; 86 | ### 6.2 自顶向下预测分析; 87 | ### 6.3 LL(1)分析; 88 | ### 6.4 几种文法变换; 89 | ### 6.5 LL(1)分析的出错处理; 90 | 91 | ## 第7讲 自底向上语法分析(4.5课时) 92 | ### 7.1 自底向上分析思想; 93 | ### 7.2 移进-归约分析; 94 | ### 7.3 LR分析基础; 95 | ### 7.4 LR(0)、SLR(1)、LR(1)、LALR(1)等系列分析方法; 96 | ### 7.5 二义文法在LR 分析中的应用; 97 | ### 7.6 LR 分析的出错处理; 98 | ### 7.7 几类分析文法之间的关系 99 | 100 | ## 第8讲 语法制导的语义处理基础(3课时) 101 | ### 8.1 属性文法; 102 | ### 8.2 基于属性文法的语义处理; 103 | ### 8.3 翻译模式; 104 | ### 8.4 基于翻译模式的语义处理 105 | 106 | ## 第9讲 语义分析(2.5课时) 107 | ### 9.1 语义分析概述; 108 | - 以类型检查程序设计为重点 109 | ### 9.2 常规处理介绍; 110 | - 类型检查、说明语句、赋值语句及算数表达式、数组说明和数组元素引用、布尔表达式、控制语句、拉链与代码回填技术、过程调用 111 | 112 | ## 第10讲 中间代码生成(3课时) 113 | ### 10.1 中间代码生成概述; 114 | - 以常用语言机制的实现技术为主线 115 | ### 10.2 常规处理介绍; 116 | - 类型检查、说明语句、赋值语句及算数表达式、数组说明和数组元素引用、布尔表达式、控制语句、拉链与代码回填技术、过程调用 117 | 118 | ## 第11讲 运行时存储组织(2.5课时) 119 | ### 11.1 运行时存储组织概述; 120 | ### 11.2 程序运行时存储空间的布局; 121 | ### 11.3 存储分配策略; 122 | ### 11.4 活动记录; 123 | ### 11.5 过程调用与参数传递; 124 | ### 11.6 面向对象程序运行时组织; 125 | 126 | ## 第12讲 代码优化(2.5课时) 127 | ### 12.1 基本块、流图和循环; 128 | ### 12.2 数据流分析基础(数据流方程,典型数据流分析举例,UD链,DU链); 129 | ### 12.3 基于 DAG 表示的局部优化; 130 | 131 | ## 第13讲 目标代码生成(2.5课时) 132 | ### 13.1 目标代码生成技术(代码生成基础,一个简单的代码生成算法,图着色物理寄存器分配算法); 133 | ### 13.2 目标代码优化技术简述; 134 | 135 | # 课程实验 136 | ## 实验名称 137 | 一个简单面向对象语言编译程序的实现 138 | 139 | ## 实验目的 140 | 经历开发一个小型编译程序的主要阶段,掌握编译程序设计的基本方法、常用语言机制的实现技术,具有学习和使用特定编译构造工具的能力,培养综合运用所学知识开发具有一定规模的软件系统的能力。 141 | ## 实验环境 142 | 普通PC机,Windows或Linux操作系统,Java编程环境,Lex & YACC 工具,MIPS SPIM (Wisconsin大学)。 143 | 144 | ## 实验内容 145 | 在给定实验框架基础上实现一个简单的强类型单继承面向对象语言的编译程序。实验框架分5个阶段,目前课程的实验内容包括前4个阶段: 146 | 1. 借助 Lex 和 Yacc 实现词法和语法分析,经过一遍扫描产生一种高级中间表示(实验框架指定的抽象语法树)。 147 | 1. 遍历抽象语法树构造符号表、实现静态语义分析,产生带标注的抽象语法树。 148 | 1. 从带标注的抽象语法树生成 TAC 中间表示,后者可由实验框架转成MIPS 汇编码在SPIM 模拟器上运行。 149 | 1. 基于 TAC 实现一些简单的数据流分析。 150 | 151 | ### 创新或扩展实验 152 | 对于有余力的同学,可以自行选择在已有实验框架基础上进行有意义的改进工作,通常需要与教师或助教沟通后方可确定创新或扩展实验的选题。 153 | 154 | ## 实验评测 155 | ##各阶段15~25个测试程序(5~10不公开);检查与标准输出的一致程度;实验报告的质量占20%成绩;必要时个别抽查。创新或扩展实验的评价需综合考虑创新性、实用性、合理性、难度、工作量等因素。 156 | -------------------------------------------------------------------------------- /compiler-knowledge-point.md: -------------------------------------------------------------------------------- 1 | # 编译原理 2 | 3 | 每个知识领域应有本领域的范围说明,包括领域知识的组成(知识单元)以及预期的能力培养目标,能力点的检测方法(实验设计的依据) 4 | 5 | ## 内容概述 6 | 编译原理知识领域包含以下内容: 7 | 8 | 1. 编译原理概述:什么是编译程序、编译程序的逻辑结构、编译程序的组织、编译程序的伙伴程序。 9 | 2. 词法分析:正规文法、正规式、*正规文法和正规式的等价性(注:关于等价性证明在“形自”与“编原”中都没讲)*、确定有穷自动机(DFA)、非确定有穷自动机(NFA)、带e转移的非确定有穷自动机(e-NFA)、正规式转换为等价的e-NFA、NFA和e-NFA转换为等价的DFA、DFA的最小化、正规式和有穷自动机的等价性、词法分析程序的自动构造工具lex 及其变种。(备注:这部分的多数内容在“形式语言与自动机”中学过,所以在上课时间上可以减少,为没有修过“形式语言与自动机”的同学提供相应的自学材料)。 10 | 3. 自顶向下语法分析 :确定的自顶向下分析的设计思路、LL(1)文法的判别、某些非LL(1)文法到LL(1)文法的等价变换、递归下降 LL(1)分析、表驱动 LL(1)分析、LL(1)分析中的出错处理。*(注:文法的基础知识在“形式语言与自动机”中学过,相应内容在本课程语法分析相关部分不再重复,为没有修过“形式语言与自动机”的同学提供自学材料)* 11 | 4. 自底向上语法分析:自底向上分析思想、移进-归约分析、自底向上优先分析的设计思路、*简单优先分析法、算符优先分析法(目前的“编原”没深入讲)*、LR分析的基本思路、LR(0)分析、SLR(1)分析、LR(1)分析、LALR(1)分析、二义性文法在LR分析中的应用。(备注:自底向上优先分析相关内容目前不讲,可作为可选内容或课后拓展内容) 12 | 5. 语法制导的语义计算:基于属性文法的语义计算、基于翻译模式的语义计算, 语法分析程序的自动构造工具yacc 及其变种。 13 | 6. 静态语义分析和中间代码生成:符号表、静态语义分析、中间代码生成。 14 | 7. 运行时存储组织:运行时存储组织的作用/任务/存储布局/分配策略、活动记录、过程调用、面向对象语言存储分配策略、函数式语言的存储组织方式。(备注:目前主流教材龙书和虎书中也都有函数式语言的存储组织,这部分重要性在增强) 15 | 8. 中间代码优化:基本块、流图和循环、数据流方程、数据流分析、变量使用数据流信息、常见代码优化技术。(备注:中间代码优化在主流教材龙书和虎书中都有比重较大的部分。因课时有限,具体讲的时候可根据时间选择个别的为例,但讲义材料中可以整理出来。这部分内容,可以和后续的“编译原理专题训练”统筹考虑。但后者是选修,所以,本课程除了常见代码优化技术的概述,再讲少量简单而重要的优化算法的具体实现用以示范,或许也是有必要的,具体讲什么可以进一步规划。) 16 | 9. 目标代码生成:目标代码生成基本过程、图着色寄存器分配。基于RISC-V的后端代码生成。(备注:当前RISC-V是计算机组成原理课程、操作系统课程中的重要组成部分,编译后端适当讲解基本指令集的生成是有必要的。) 17 | 18 | 19 | 20 | ## 1. 编译原理概述 21 | 22 | ### 核心学习成效: 23 | 24 | 1. 掌握编译的涵义:能从代码编译的过程和编译程序的结构理解编译的内涵 25 | 2. 掌握编译的组合:理解单源单目标编译程序(可用T-型图表示)之间的组合机制,从中进一步理解“交叉编译”、“用已有编译程序实现新语言支持”、以及“将编译程序移植到新机器”等的含义和基本设计思路。 26 | 27 | ### 能力检测方法: 28 | 29 | 1. 能通过编译组合(T-形图组合)的方式解释“交叉编译”、“用已有编译程序实现新语言支持”、“将编译程序移植到新机器”等需求的单源单目标编译程序的组合设计过程。 30 | 2. 对编译程序有一个完整的总体分析和理解。 31 | 32 | ## 2. 词法分析: 33 | 34 | 词法分析:正规文法、正规式、正规文法和正规式的等价性(也可在“形式语言与自动机”课中增加这一内容)、确定有穷自动机(DFA)、非确定有穷自动机(NFA)、带e转移的非确定有穷自动机(e-NFA)、正规式转换为等价的e-NFA、NFA和e-NFA转换为等价的DFA、DFA的最小化、正规式和有穷自动机的等价性、词法分析程序的自动构造工具lex 及其变种。。 35 | 36 | ### 核心学习成效: 37 | 38 | 39 | 1. *正规文法(注:在“形自”与“编原”中都没深入讲)*、正规式、有穷自动机(DFA,NFA和e-NFA)的基本概念。 40 | 41 | 2. *正规文法和正规式的等价性的证明过程(注:可选,关于等价性证明在“形自”与“编原”中都没深入讲)*。 42 | 43 | 3. 正规式转换为等价的e-NFA的过程与算法(注:可选) 44 | 45 | 4. NFA和e-NFA转换为等价的DFA的过程与算法(注:可选) 46 | 47 | 5. 确定有穷自动机(DFA)的过程与最小化算法(注:可选) 48 | 49 | 6. 正规式和有穷自动机的等价性的证明过程(注:可选) 50 | 51 | 7. lex及其变种的使用方法 52 | 53 | 54 | ### 能力检测方法: 55 | 56 | 1. 能够根据正规式 推出 正规文法 57 | 58 | 2. 能够根据正规文法 推出 正规式 59 | 60 | 3. 能证明正规文法和正规式的等价性(也可在“形式语言与自动机”课中增加这一内容)。(注:可选) 61 | 62 | 4. 能把一个NFA或e-NFA转换为等价的DFA(注:可选) 63 | 64 | 5. 能最小化DFA(注:可选) 65 | 66 | 6. 能完成词法分析器的编译原理实验 67 | 68 | 69 | 70 | ## 3. 自顶向下语法分析 71 | 72 | ### 核心学习成效: 73 | 74 | 1. LL(1)文法的判别:判断某文法是否是LL(1)文法 75 | 76 | 2. 某些非LL(1)文法到LL(1)文法的等价变换方法 77 | 78 | 3. 递归下降 LL(1)分析的方法 79 | 80 | 4. 表驱动 LL(1)分析的方法 81 | 82 | 5. LL(1)語法分析中的出错处理 83 | 84 | 85 | ### 能力检测方法: 86 | 87 | 1. 能计算FIRST集合,FOLLOW集合,SELECT集 88 | 89 | 2. 对于某些非LL(1)文法,能提取左公因子 90 | 91 | 3. 对于某些非LL(1)文法,能消除左递归 92 | 93 | 4. 基于某LL(1)文法,能进行递归下降 LL(1)分析 94 | 95 | 5. 基于某LL(1)文法,能进行表驱动 LL(1)分析 96 | 97 | 6. 理解应急恢复、短语层恢复的出错处理的方法 98 | 99 | 7. 能完成基于LL(1)分析的语法分析器的编译原理实验 100 | 101 | 102 | 103 | 104 | ## 4. 自底向上语法分析 105 | 106 | ### 核心学习成效: 107 | 108 | 1. 理解自底向上优先分析的设计思路 109 | 110 | 2. 掌握简单优先分析法:优先关系的定义、简单优先文法的定义、简单优先分析法的操作步骤 111 | 112 | 3. 掌握算符优先分析法:直观算符优先分析、算符优先文法的定义、算符优先关系表的构造、算符优先分析算法 113 | 114 | 4. 理解算符优先分析法的局限性 115 | 116 | 5. 理解LR分析的基本思路和重要步骤:移进、归约、接受、报错 117 | 118 | 6. 掌握LR(0)分析:可归前缀、活前缀、句柄、识别活前缀的有限自动机、活前缀及其可归前缀的一般计算方法、LR(0)项目集规范族的构造方法 119 | 120 | 7. 掌握SLR(1)分析:SLR(1)项目集规范族的构造方法 121 | 122 | 8. 掌握LR(1)分析:LR(1)项目集规范族的构造方法 123 | 124 | 9. 掌握LALR(1)分析:LALR(1)项目集规范族的构造方法 125 | 126 | 10. 了解根据特殊限定消解某些二义性文法带来的冲突问题、以适应LR分析的方法 127 | 128 | 129 | ### 能力检测方法: 130 | 131 | 1. 能描述自底向上优先分析的基本处理过程 132 | 133 | 2. 基于某些无二义的文法,能进行简单优先分析 134 | 135 | 3. 能完成基于简单优先分析的语法分析器的编译原理实验 136 | 137 | 4. 能完成基于LR(0)、LALR(1)等分析的语法分析器的编译原理实验 138 | 139 | 140 | ## 5. 语法制导的语义计算 141 | 142 | 143 | ### 核心学习成效: 144 | 145 | 1. 掌握基于属性文法的语义计算:属性文法的定义、S-属性文法和L-属性文法、基于S-属性文法的语义计算、基于L-属性文法的语义计算 146 | 147 | 2. 掌握基于翻译模式的语义计算:翻译模式的定义、基于S-翻译模式的语义计算、基于L-翻译模式的自顶向下语义计算、基于L-翻译模式的自底向上语义计算 148 | 149 | 3. yacc及其变种的使用方法 150 | 151 | 152 | ### 能力检测方法: 153 | 154 | 1. 能够对基于S-属性文法进行语义计算 155 | 156 | 2. \2. 能够对基于L-属性文法进行语义计算 157 | 158 | 3. 能够对基于S-翻译模式进行语义计算 159 | 160 | 4. 能够对基于L-翻译模式进行语义计算 161 | 162 | 5. 能通过使用lex,yacc 实现对一个小语言(如四则计算等)的处理 163 | 164 | 6. 能完成语义计算相关的编译原理实验 165 | 166 | 167 | 168 | ## 6. 静态语义分析和中间代码生成 169 | 170 | ### 核心学习成效: 171 | 172 | 1. 理解符号表:符号表的作用、常见属性和实现、作用域与可见性 173 | 174 | 2. 理解静态语义分析:类型检查 175 | 176 | 3. 理解中间代码生成:中间表示形式、抽象语法树、三地址码 177 | 178 | 179 | ### 能力检测方法: 180 | 181 | 1. 理解符号表的创建和操作过程 182 | 183 | 2. 能判断单符号表组织的作用域与可见性 184 | 185 | 3. 能判断多符号表组织的作用域与可见性 186 | 187 | 4. 能结合语法制导的语义计算进行类型检查 188 | 189 | 5. 能结合语法制导的语义计算生成抽象语法树 190 | 191 | 6. 能结合语法制导的语义计算生成三地址码 192 | 193 | 7. 能完成符号表和中间代码生成相关的编译原理实验 194 | 195 | ## 运行时存储组织 196 | 197 | ### 核心学习成效: 198 | 199 | 1. 了解存储组织的作用和任务 200 | 201 | 2. 掌握程序运行时存储空间的布局 202 | 203 | 3. 掌握静态、栈式、堆式存储分配策略 204 | 205 | 4. 掌握活动记录:过程活动记录、嵌套过程定义中非局部量的访问、嵌套程序块的非局部量访问 206 | 207 | 5. 掌握动态作用域规则和静态作用域规则 208 | 209 | 6. 掌握过程调用的编译生成 210 | 211 | 7. 理解面向对象语言存储分配策略 212 | 213 | 8. 理解函数式语言的存储组织方式 214 | 215 | 216 | ### 能力检测方法: 217 | 218 | 1. 能用对某程序代码的运行时布局下的活动记录进行推演和计算。 219 | 220 | 2. 能区分静态、栈式、堆式存储分配策略的区别。 221 | 222 | 3. 能用对某程序代码的运行时布局下的控制链(动态链)和访问链(静态链)进行分析。 223 | 224 | 4. 能根据单继承面向对象语言的特性,给出面向对象存储组织的方式和具体内容。 225 | 226 | 5. 能根据基本函数式语言的特性,给出相应的运行时存储设计方案。 227 | 228 | 6. 能完成运行时存储组织相关的编译原理实验。 229 | 230 | 231 | 232 | 233 | ## 8. 中间代码优化 234 | 235 | ### 核心学习成效: 236 | 237 | 1. 掌握基本块、流图和循环的基本概念 238 | 239 | 2. 掌握数据流方程的概念、一般框架和一般求解过程 240 | 241 | 3. 掌握典型的数据流分析:到达-定值数据流分析、活跃变量数据流分析 242 | 243 | 4. 掌握变量使用数据流信息:UD链、DU链、基本块内变量的待用信息、基本块内变量的活跃信息 244 | 245 | 5. 掌握代码优化技术:了解窥孔优化、局部优化、循环优化、全局优化及过程间优化的基本思想和典型示例,掌握其中某些优化算法的具体实现方法。 246 | 247 | 248 | ### 能力检测方法: 249 | 250 | 1. 能求解数据流方程,进行到达-定值数据流分析、活跃变量数据流分析 251 | 252 | 2. 能通过求解数据流方程,获得UD链、DU链、基本块内变量的待用信息、基本块内变量的活跃信息 253 | 254 | 3. 可对代码片段进行优化:能设计实现窥孔优化、局部优化、循环优化、全局优化中某些典型的优化方法。 255 | 256 | 4. 能完成中间代码优化相关的编译原理实验。 257 | 258 | 259 | 260 | 261 | ## 9. 目标代码生成 262 | 263 | ### 核心学习成效: 264 | 265 | 1. 掌握目标代码生成基本过程 266 | 267 | 2. 目标代码生成的主要环节:指令选择、寄存器分配、指令调度 268 | 269 | 3. 理解图着色寄存器分配算法 270 | 271 | 272 | ### 能力检测方法: 273 | 274 | 1. 能基于图着色算法进行寄存器分配 275 | 276 | 2. 基于某种中间码格式和目标码定义,能手动完成一个简单的中间码到目标代码生成过程 277 | 278 | 3. 能完成(RISC-V)目标代码生成相关的编译原理实验。 279 | 280 | 281 | 282 | -------------------------------------------------------------------------------- /lang-spec/decaf-with-class-spec.md: -------------------------------------------------------------------------------- 1 | # Decaf语言规范 2 | 3 | 在本课程中,我们将为Decaf语言编写一个编译器。Decaf是一种强类型的、面向对象的、支持单继承和对象封装的语言。这一语言与C/C++/Java非常类似,因此你会发现很容易弄懂它。但是,它并不和这些语言中的任何一个完全一样。为了使这个作业的难度能够被接受,这里所采用的Decaf语言的特性是经过删减和简化的,但即使这样,你将会发现这种语言的表达能力仍然足够编写出各种漂亮的面向对象程序来。 4 | 这份文档给出本课程中Decaf语言的语法和语义规范,你在完成这个课程项目的过程中将会反复查阅。 5 | 6 | ## 语法 7 | 8 | ### 词法规范 9 | 10 | 下面是Decaf的关键字,他们都是保留字(大小写敏感): 11 | 12 | ``` 13 | bool break class else extends for if int new null return string 14 | this void while static Print ReadInteger ReadLine instanceof 15 | ``` 16 | 17 | 一个**标识符**是以字母开头的字母、数字和下划线的序列。Decaf是大小写敏感的,例如`if`是一个关键字,但是`IF`却是一个标识符,`binky`和`Binky`是两个不同的标识符。 18 | 19 | **空白字符**(即空格、制表符和换行符)仅用于分隔单词。关键字和标识符必须被空白字符或者一个既不是关键字也不是标识符的单词隔开。`ifintthis`是单个标识符而不是三个关键字,但`if(23this`被识别成四个单词。 20 | 21 | **布尔常量**是`true`或者`false`,如同关键字一样,它们也是保留字。 22 | 23 | 一个**整型常量**既可以是十进制整数也可以是十六进制整数。一个十进制整数是一个十进制数字(`0-9`)的序列;十六进制整数必须以`0X`或者`0x`开头(是零,而不是字母`O`),后面跟着一个十六进制数字的序列。十六进制数字包括了十进制数字和从`a`到`f`的六个字母(大小写均可)。合法的整数的例子有:`8`, `012`, `0x0`, `0X12aE`。 24 | 25 | 一个**字符串常量**是被一对双引号包围的可打印ASCII字符序列。字符串常量中不可以包含换行符,也不可以分成若干行,例如: 26 | ``` 27 | "this is not a 28 | valid string constant" 29 | ``` 30 | 字符串常量中支持以下几种转义序列:`\"`表示双引号,`\\`表示单个反斜杠,`\t`表示制表符,`\n`表示换行符,`\r`表示回车符。其他转义符不合法。例如:`"\t"`是一个长度为1的字符串,其中的`\t`转义为制表符;而`"\a"`不合法。 31 | 32 | 该语言中的操作符和分隔字符包括: 33 | ``` 34 | + - * / % < <= > >= = == != && || ! ; , . [ ] ( ) { } 35 | ``` 36 | 37 | 单行注释是以`//`开头直到该行的结尾。Decaf中没有多行注释。 38 | 39 | 40 | 41 | ### 文法规范 42 | 43 | 参考语法以一种 EBNF 形式给出,将用到下列元符号: 44 | 45 | - `*` 表示一个符号出现任意多次,包括零次 46 | - `+` 表示一个符号出现至少一次 47 | - `?` 表示一个符号可选,即至多出现一次 48 | - `|` 分隔多个产生式右部,无顺序 49 | - `ε` 表示空,即不存在任何符号 50 | 51 | 所有终结符要么以单引号字符串的形式直接出现,或者用以大写字母开头的标识符代替,非终结符对应的标识符均以小写字母开头。此外,我们还使用 `()` 来显式限定上述元符号作用的符号或者符号串。 52 | 53 | ``` 54 | // Top level 55 | 56 | topLevel ::= classDef+ 57 | classDef ::= 'class' id ('extends' id)? '{' field* '}' 58 | field ::= varDef | methodDef 59 | varDef ::= var ';' 60 | methodDef ::= 'static'? type id '(' varList ')' stmtBlock 61 | var ::= type id 62 | varList ::= var (',' var)* | ε 63 | 64 | // Types 65 | 66 | type ::= 'int' | 'bool' | 'string' | 'void' | 'class' id 67 | | type '[' ']' 68 | 69 | // Statements 70 | 71 | stmt ::= stmtBlock 72 | | simple ';' 73 | | 'if' '(' expr ')' stmt ('else' stmt)? 74 | | 'while' '(' expr ')' stmt 75 | | 'for' '(' simple ';' expr ';' simple ')' stmt 76 | | 'break' ';' 77 | | 'return' expr? ';' 78 | | 'Print' '(' exprList ')' ';' 79 | 80 | stmtBlock ::= '{' stmt* '}' 81 | 82 | simple ::= var ('=' expr)? | lValue '=' expr | expr | ε 83 | lValue ::= (expr '.')? id | expr '[' expr ']' 84 | 85 | // Expressions 86 | 87 | expr ::= lit 88 | | 'this' 89 | | lValue 90 | | (expr '.')? id '(' exprList ')' 91 | | '(' expr ')' 92 | | unaryOp expr 93 | | expr binaryOp expr 94 | | '(' 'class' id ')' expr 95 | | 'ReadInteger' '(' ')' 96 | | 'ReadLine' '(' ')' 97 | | 'new' id '(' ')' 98 | | 'new' type '[' expr ']' 99 | | 'instanceof' '(' expr ',' id ')' 100 | 101 | lit ::= INT_LIT | BOOL_LIT | NULL_LIT | STRING_LIT 102 | unaryOp ::= '-' | '!' 103 | binaryOp ::= '+' | '-' | '*' | '/' | '%' | '==' | '!=' | '<=' | '<' | '>=' | '>' | '&&' | '||' 104 | exprList ::= expr (',' expr)* | ε 105 | 106 | id ::= IDENTIFIER 107 | ``` 108 | 109 | 表达式中, 算符的优先级从高到低如下: 110 | 111 | | 算符 | 结合性 | 含义 | 112 | | --- | --- | --- | 113 | | `(` | 左结合 | 括号 | 114 | | `.` `[` `(` | 左结合 | 成员字段选择, 数组索引, 函数调用 | 115 | | `!` `-` `(class Id)` | 右结合 | 逻辑非, 相反数, 类型转换 | 116 | | `*` `/` `%` | 左结合 | 乘除模 | 117 | | `+` `-` | 左结合 | 加减 | 118 | | `<=` `>=` `<` `>` | 不结合 | 大小比较 | 119 | | `==` `!=` | 左结合 | 相等比较 | 120 | | `&&` | 左结合 | 逻辑且 | 121 | | `\|\|` | 左结合 | 逻辑或 | 122 | 123 | ## 程序结构 124 | 125 | 一个Decaf程序是一个**类定义**的序列,其中每个类定义包含该类的完整描述。与Java类似,Decaf对于类定义的顺序没有要求,前面定义的类可以引用后面定义的类。 126 | 127 | 一个Decaf程序应当包含一个名为`Main`的主类,主类中应包含一个名为`main`、不带任何参数且返回类型为`void`的静态方法。以该方法为入口执行程序。注意:从父类继承而来的`main`方法在这里不起作用。 128 | 129 | ### 作用域 130 | 131 | Decaf支持多种层次的作用域: 132 | 133 | - 最高层是**全局作用域**,其中只包含各类的定义。 134 | - 每个类定义有自己的**类作用域**,包含该类所有成员,包括成员变量、成员方法和静态方法。 135 | - 每个函数有一个用于声明参数表的**参数作用域**和存放函数体的**局部作用域**。参数作用域包括各参数的声明,局部作用域包括各局部变量的定义。局部作用域允许嵌套。 136 | 137 | 访问规则如下: 138 | 139 | - 内层作用域可以访问到外层作用域的所有符号。 140 | - 在局部作用域中声明的符号不能与任何该声明之前的外层作用域的符号重名。在类作用域中重新声明的符号需要满足重写规则(见后)。 141 | - 类、函数和类成员变量可以在声明之前使用,只要该符号在引用处是可访问且可见的(见后)。 142 | - 局部作用域中的变量必须先声明后使用。 143 | - 同一个作用域中的符号是唯一的。 144 | - 不可访问在一个已经关闭的作用域中声明的符号。 145 | 146 | ### 类型 147 | 148 | 预定义好的基本类型有 `int`, `bool` 和 `string`。`void`仅用于描述无返回值函数的返回值类型。 149 | Decaf有原生的数组类型支持,且数组的每个元素可以是任何非`void`的类型,支持数组的数组。 150 | Decaf的每一个用户定义好的类都对应着一个类类型。 151 | 152 | ### 变量 153 | 154 | 变量的类型可以是除`void`以外的任何类型。Decaf中的变量根据所在作用域不同,分为以下三类: 155 | 156 | - 成员变量:属于类作用域,定义在类的内部,作为类的一个成员; 157 | - 参数变量:属于参数作用域,定义在函数的参数列表中,在整个函数体中可访问; 158 | - 局部变量:属于局部作用域,可定义在函数方法体的任何语句序列中,在声明点到该声明所在的作用域末之间的区域可访问。 159 | 160 | ### 数组 161 | 162 | Decaf的数组是同质的(即数组每个元素都是同一种类型)线性索引的容器。数组的访问是通过引用的方式来实现的。数组的声明不包括大小信息,而且所有的数组都是在堆中使用内置操作符`new`按照所需的大小来动态分配的。 163 | 164 | - `new type[N]`按照指定的元素类型和元素个数分配一个新的数组,这里 `N` 必须是非负整数,试图分配一个负长度的数组将会引起一个运行时错误。 165 | - 数组的元素个数在分配数组的时候被记录下来,而且一旦分配完毕就不能再更改。 166 | - 数组支持这样的特定语法类获取其元素个数:`arr.length()`。 167 | - 数组索引的访问方式(形如`arr[i]`)只能够用在数组类型的变量`arr`上。 168 | - 当索引一个超出数组范围的位置的时候将会出现运行时错误。 169 | - 数组可以作为函数参数或者函数的返回值进行传递。数组对象本身通过传值方式进行传递,但是它是以引用方式来访问的,因此对数组元素的修改会反映到函数的调用方处。 170 | - 数组赋值是浅拷贝(也就是说仅仅复制对数组空间的引用)。 171 | - 数组比较(==和!=)仅比较引用是否相同。 172 | 173 | ### 字符串 174 | 175 | Decaf中的字符串支持很少。一个Decaf程序可以包含字符串常量、通过内置函数`ReadLine`从用户那里读取字符串,比较字符串,和打印字符串。Decaf不支持程序创建和修改字符串,或者在字符串和其它数据类型之间进行转换等。字符串的访问是通过引用(指向字符串首址的指针)来实现的。 176 | 177 | - `ReadLine()`读取用户输入的一行字符,直到换行符为止(但不包括换行符)。 178 | - 字符串赋值是浅拷贝(也就是说,把一个字符串赋值给另一个字符串仅仅复制对字符串内容的引用)。 179 | - 字符串可以作为函数的参数或者返回值进行传递。 180 | - 字符串比较(`==`和`!=`)以区分大小写的方式比较两个字符串的字符序列。 181 | 182 | ### 函数定义 183 | 184 | 函数的定义用于建立函数名字以及与这个名字相关联的类型签名,类型签名包括函数是否是静态的、返回值类型、形参表的大小以及各形参的类型。函数的定义提供类型签名以及组成函数体的语句。Decaf中的函数**不是**一等公民。 185 | 186 | - 函数必须定义在一个类作用域中,即作为成员方法或静态方法,函数之间不允许嵌套。 187 | - 函数可以有零或者多个形参。 188 | - 形参的类型可以是除`void`以外的基本类型、数组类型或者类类型。 189 | - 用在形参表中的标识符必须唯一(即形参不能重名)。 190 | - 函数的形参是声明在函数体关联的局部作用域之外的另一个作用域中。 191 | - 函数的返回类型可以是任何的基本类型、数组类型或者类类型。`void`类型用于指出函数没有返回值。 192 | - 一个函数只能被定义一次。 193 | - 不支持函数的重载,即指使用名字相同但类型签名不同的函数。但支持子类覆盖父类的成员方法。 194 | - 如果一个函数具有不是`void`的返回值类型,则其中的任何return语句必须返回一个与该返回类型兼容的值。 195 | - 如果一个函数的返回值类型为`void`,则它只能使用不带参数的return语句。 196 | 197 | ### 函数调用 198 | 199 | 函数调用包括从调用方到被调用方传递参数值、执行被调用方的函数体、并返回到调用方(可能带有返回值)的过程。当一个函数被调用的时候,要传递给他的实参将会被求值并且与对应形参进行绑定。Decaf中所有的参数和返回值都通过传值的方式进行传递。 200 | 201 | - 所调用的函数必须是有定义的,无论其定义是否出现在调用处之前。 202 | - 函数调用中实参的个数必须与函数所需形参的个数相匹配。 203 | - 函数调用中每个实参的类型必须与对应形参的类型相匹配。 204 | - Decaf采用**严格求值**策略,即函数调用之前按照从左至右的顺序依次对实参进行求值。 205 | - 函数调用过程中执行到一个return语句或者到达函数在源程序中的结尾时把控制权交还给调用方。 206 | - 函数调用结果的类型是函数声明时候的返回值类型。 207 | 208 | ### 类 209 | 210 | Decaf 程序中定义一个类的时候将会创建一个新的类型名称以及一个类作用域。一个类定义是一个成员域的列表,每一个成员域要么是一个变量,要么是一个函数———这些变量有的时候被称为实例变量、成员数据或者属性;这些函数被称为方法或者成员函数。 211 | 212 | Decaf通过一种简单的机制来强制对象的封装:所有的变量都是类内私有的(访问范围限于定义它的类及其子类,C++中称这种访问级别为protected),所有的方法都是公开的(到处都可以访问)。因此,访问一个对象的状态的唯一手段是通过它的成员函数。 213 | 214 | - 所有的类定义都是全局的,也就是说,类定义不能出现在函数中。 215 | - 所有的类必须拥有唯一的名字。 216 | - 一个成员域的名字在同一个类作用域中只能够使用一次(即不允许方法同名、变量同名或者变量和方法之间同名)。 217 | - 成员域可以先使用后声明。 218 | - 实例变量的类型可以是除`void`以外的基本类型、数组类型或者类类型。 219 | - 在非静态方法中访问同一个类的成员域的时候`this.`的使用是可选的。 220 | 221 | ### 对象 222 | 223 | 一个变量如果其类型为类类型的话则称为对象,或者该类的实例。对象的访问以引用方式实现。所有的对象都是使用内置的new操作符在堆中动态分配的。 224 | 225 | - 声明一个对象变量的时候所使用的类名必须有定义。 226 | - new参数中的类名必须是有定义的。 227 | - 操作符.用于访问一个对象的成员域(变量或者方法)。 228 | - 对于形如expr.method()这样的方法调用,expr的类型必须是某个类T,method必须是T的成员方法的名字。对于静态方法,expr可以是一个类名,对于非静态方法,expr必须是一个对象。 229 | - 对于形如expr.var这样的变量访问,expr的类型必须是某个类T,var必须是T的成员变量的名字,而且这样的访问只能出现在类T或者其子类的作用域中。 230 | - 对上一条的补充:在类作用域中,你可以通过该类或其子类的任何实例访问该类的私有变量,但不可以访问与该类无关的其他类实例的变量。 231 | - 对象的赋值是浅拷贝(也就是说对象的赋值仅仅复制引用)。 232 | - 对象可以作为函数的参数或者返回值进行传递。对象本身是通过传值的方式进行传递的,但是它通过引用的方式进行访问,因此对其成员函数的更改会反映到调用方那里。 233 | 234 | ### 继承 235 | 236 | Decaf仅支持单继承,允许子类通过添加或者覆盖已有方法来扩展基类。A继承B的语义是A包含了在B中有定义的所有成员域(包括变量和函数),而且还有A自己的成员域。 237 | 子类可以覆盖(即通过重新定义的方式进行替换)一个继承而来的方法,但是重新定义的版本必须和原来的方法在返回类型和参数类型上一致。Decaf支持有限的子类型,若A继承了B,那么A对应的类型自动成为B对应类型的子类型,简记为`A <: B`。 238 | 所有的Decaf非静态方法都是动态绑定的(即C++中的virtual)。编译器在编译的时候不可能决定要调用方法的具体地址(考虑通过一个父类对象调用一个被子类覆盖了的方法),因此,函数地址的绑定是在运行时通过查询一张与各对象关联的成员方法列表(即虚函数表)来实现的,我们会在后面详细讨论它。 239 | 240 | - 如果指定了父类,则父类的类型必须是有定义的类类型。 241 | - 子类继承父类所有的成员域(包括变量和方法)。 242 | - 子类不能覆盖继承而来的变量和静态方法。 243 | - 子类能够覆盖继承而来的非静态方法(通过重新定义这个方法实现),但是新的版本的返回类型和参数类型必须和原有方法匹配。 244 | - 一个类不能以不同的类型签名来指定两个同名的方法。 245 | - 子类类型兼容于父类类型,子类的对象可以用于替换父类类型的表达式(例如,如果一个变量被声明为class Animal类型,而且Cow是Animal的一个子类,你就可以用一个class Cow类型的表达式给它赋值。类似地,如果Binky函数有一个类型为class Animal的形参,则可以传第一个class Cow类型的变量作为实参,对于返回Animal对象的函数,你也可以返回一个Cow对象作为返回值)。但反过来不成立(父类对象不能用来替换子类类型的表达式)。 246 | - 前面的规则也适用于多级继承的情况。 247 | - 检查一个对象的成员域的时候,判断是什么类的成员时看的是编译时这个对象是什么类的对象(也就是说,一旦你把Cow的实例提升成为class Animal类型的变量,则你不能通过这个变量访问那些Cow特有的成员域)。 248 | - 数组不支持协变,即若 `T1 <: T2`,那么 `T1[] <: T2[]` 不成立,注意这与Java不同。 249 | 250 | ### 反射 251 | 252 | Decaf支持instanceof,可以通过instanceof来判断一个表达式的结果是否是某个类类型的实例。 253 | 254 | - 用于instanceof参数表达式的结果必须是一个对象,而不能是某种基本类型的实例或者数组。 255 | - 一个对象,除了是自己本身类类型的一个实例外,也是本身类类型的父类类型的实例(例如A是B的子类,那么一个A的实例a,也是B的实例)。 256 | 类型的等价与兼容 257 | Decaf基本上是一个强类型的语言:每个变量都有一个特定的类型与之关联,而且该变量只能够容纳属于那种类型范围的值。如果类型A等价于类型B,则其中某一类型的表达式可以自由地替换另一个类型的表达式。两种基本类型当且仅当他们是同一个类型的时候等价。两种数组类型当且仅当他们的元素类型等价的时候等价(元素类型本身可能也是一种数组,因此这意味着这里用到了结构等价的递归定义)。两种类类型等价当且仅当他们是相同的类型(也就是说仅仅是名字等价,而不是结构等价)。 258 | 类型的兼容性是受到更多限制的单向关系。如果类型A兼容于类型B,那么一个A类型的表达式可以用来替换一个B类型的表达式,但反过来不一定。等价的类型在两个方向上都是兼容的。子类兼容于父类,但反过来不成立。null类型兼容于所有的类类型。对于其他基本类型,兼容和等价的意义相同(注意,null并不兼容于string)。诸如赋值、参数传递等操作不仅允许等价的类型,而且允许兼容的类型。 259 | 260 | ### 强制类型转换 261 | 262 | Decaf支持类类型之间的强制转换,并且只支持类类型之间的强制转换,基本类型和数组不能进行强制转换。 263 | 264 | - 被转换的必须是一个对象,而不能是基本类型和数组。 265 | - 被转换的对象必须是转换到的类类型的一个实例,否则出现一个运行时错误。 266 | - 如果被转换对象的类类型兼容于被转换到的类类型,则转变为自动的类型转换。 267 | 268 | 注意一个对象在编译期可以知道的类类型和其实际的类类型可能是不一样的(这也是为什么会有虚函数)。比如A继承B,那么在很多需要A类型实例的地方,都可以用B类型的实例来替代,这个时候你可以把这个“A类型”的实例,强制转换成B类型的实例。 269 | 270 | ### 赋值 271 | 272 | 对于基本类型(除字符串),Decaf使用值拷贝(value-copy)的语义:语句LValue = Expr将会复制计算Expr所得到的结果值到LValue所对应的内存位置。对于数组,对象和字符串,Decaf使用引用拷贝(reference-copy)的语义:语句LValue = Expr会使得LValue中存放有一个对于Expr计算结果的引用(也就是说,这种赋值复制的是指向对象的指针而不是对象本身)。换句话说,数组、对象和字符串的赋值是浅拷贝,不是深拷贝。 273 | 274 | - LValue必须是可被赋值的变量位置。 275 | - 一个赋值语句右边的类型必须兼容于左边的类型。 276 | - null只能够赋值给具有类类型的变量。 277 | - 在函数体中给形参赋值是合法的,这样的赋值只在函数体范围内有效。 278 | - 支持在声明局部变量时同时给出初始值。 279 | 280 | ### 控制结构 281 | 282 | Decaf控制结构是基于C和Java的,因此有很多相似之处。 283 | 284 | - 一个else子句总是和最近的一个未闭合的if语句关联。 285 | - if, while和for语句的测试部分表达式的类型必须是bool。 286 | - 一个break语句只能出现在while或者for的循环体中。 287 | - return语句的返回值的类型必须与所在函数的返回类型兼容。 288 | 289 | ### 表达式 290 | 291 | 为了简单起见,Decaf不允许在表达式中进行类型混合和转换(例如把一个整数当成布尔值使用等等)。 292 | 293 | - 常数的求值结果就是常数值本身(true,false,null,整数,字符串常量)。 294 | - 双操作数算术运算(+,-,*和/)的两个操作数必须都是int类型的,结果的类型跟操作数类型一致。 295 | - %的两个操作数必须都是int类型的,结果类型是int。 296 | - 负号的操作数必须是int类型的,结果类型与操作数类型一致。 297 | - 双操作数逻辑关系运算(<,>,<=,>=)的两个操作数必须都是int结果类型是bool。 298 | - 二元判等操作符(!= ,==)的两个操作数必须是等价类型(例如两个int,但请参考下面关于对象判等的例外情况),结果类型是bool。 299 | - 二元判等操作符的两个操作数也可以是两个对象或者一个对象和null,两个对象的类型必须在至少一个方向上兼容,结果类型是bool。 300 | - 所有的逻辑运算操作符的操作数必须都为bool类型,结果类型是bool。 301 | - &&和||不使用布尔短路;在计算结果之前两个表达式都需要求值。 302 | - 所有表达式的操作数都是从左到右求值。 303 | 304 | ## 标准库函数 305 | 306 | Decaf有一个非常小的标准库,可以用于简单的I/O和内存分配。标准库函数包括Print,ReadInteger,ReadLine和new。 307 | 308 | - 传给Print语句的参数只能是string,int或者bool类型。 309 | - 创建对象的时候new参数中的类名必须是有定义的。 310 | - 分配数组的时候传给new的第一个参数必须不是void类型,而且第二个参数必须是整型 311 | - 分配数组的时候new的返回类型是一个元素类型为T的数组,这里T是作为new的第一个参数的那个类型。 312 | - ReadLine()读取用户输入的一个字符序列,直到换行符为止(不包括换行符)。 313 | - ReadInteger()读入用户输入的一行文本,并把它转换成整数(如果用户输入不合法,返回0)。 314 | 315 | ### 运行时检查 316 | 317 | Decaf仅支持三种运行时检查(这留下很大的扩展余地): 318 | 319 | - 数组的下标所指示的位置必须在数组范围内,也就是说,在0..arr.length()–1这个范围中。 320 | - 分配数组的时候传给new的数组大小必须是非负的。 321 | - 强制类型转换时,检查被转换的对象是否是要转换到的类类型的一个实例。 322 | 323 | 当发生运行时错误的时候,会在终端上输出一个合适的错误信息,然后程序终止。 324 | 325 | ## Decaf不能做的事情 326 | 327 | Decaf被设计为一个简单的语言。虽然它具备了所有编写面向对象程序所需要的基本特点,但是仍然有不少C++和Java的编译器做到但它做不到的事情。这里是一些想到的例子: 328 | 329 | - 不检查通过null对象访问方法或者成员变量。 330 | - 不检测任何不可到达的代码。 331 | - 没有释放内存的函数或者垃圾回收机制,因此动态分配的空间不会被回收。 332 | - 不支持对象的构造函数和析构函数。 333 | - 还有很多其他的…… 334 | 335 | ## 致谢 336 | 337 | 本语言规范最初源自蒋波同学的翻译版本,后经历过部分助教同学的修改补充,在此表示衷心的感谢。 338 | 参与过Decaf实验的助教:梁英毅,张铎,曹震,李叠,蒋挺宇,许建林,谢宇轩,唐硕,毛雁华,蒋波,张迎辉,刘天淼,高崇南,王耀,刘昊,尚书,甄艳洁,沈游人,朱俸民,冀伟清,戴臻旸,贾越凯,寇明阳,李晨昊,等等。由于统计遗漏,有一些同学未列出,在此一并致谢。杨俊峰校友在引入Decaf实验时提供了帮助,以及当前Decaf实验框架基于Julie Zelenski教授教学组的原始工作,并参考了Alex Aiken教授的工作,在此向他们深表感谢。 339 | -------------------------------------------------------------------------------- /lang-spec/decaf-without-calss-spec-mit.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/learncompiler/compiler-lectures/9037cf67ac51e1f6c28a08233be07290109e66a2/lang-spec/decaf-without-calss-spec-mit.pdf -------------------------------------------------------------------------------- /lang-spec/monkey-spec.md: -------------------------------------------------------------------------------- 1 | # Monkey language in Java 2 | 3 | [![Build Status](https://travis-ci.org/lionell/monkey-in-java.svg?branch=master)](https://travis-ci.org/lionell/monkey-in-java) 4 | 5 |
6 | 7 |
8 | 9 | Monkey is very simple dynamic language. It supports only three different statement types, 10 | but you can do a lot with it! 11 | 12 | Just take a look at example below. 13 | 14 | ## Example program 15 | 16 | We are going to evaluate n-th **Fibonacci number**. To do this, we will use recursive approach. It's not efficient 17 | at all, but it show that we can call function's from themself. 18 | 19 | ``` 20 | let fib = fn(n) { 21 | if (n == 0) { 1 } 22 | else { 23 | if (n == 1) { 2 } 24 | else { fib(n - 1) + fib(n - 2) } 25 | } 26 | }; 27 | 28 | fib(6); 29 | ``` 30 | 31 | You can find this and more examples in `examples` directory. To run this code type `monkey examples/fib.mon`. 32 | 33 | ## Functions = first class citizens 34 | 35 | As you understand from the title functions are first class citizens in Monkey language. It's super powerful feature, 36 | that can be used to implement many different design patterns, closures, and more. 37 | 38 | ``` 39 | let add = fn(a, b) { a + b; }; 40 | let sub = fn() { return fn(a, b) { a - b }; }(); 41 | let applyFunc = fn(a, b, func) { func(a, b); }; 42 | 43 | applyFunc(2, 3, add) 44 | + applyFunc(3, 10, sub) 45 | + applyFunc(5, 10, fn(a, b) { a * b }); 46 | ``` 47 | 48 | To run this code type `monkey examples/funcs.mon`. 49 | We are creating function `applyFunc` that takes function as a third parameter, and function that returns function. 50 | You can also see function-literal that is called at the time of definition. 51 | 52 | Having function as first class citizens makes creating **closures** super easy. E.g. 53 | 54 | ``` 55 | let superSecretCode = 1234; 56 | let encrypt = fn(x) { x + superSecretCode; }; 57 | 58 | let superSecretCode = 4321; 59 | 60 | encrypt(17); // having superSecretCode equals 1234 61 | ``` 62 | 63 | ## Language specification 64 | 65 | Here will be formal language specification in [EBNF](https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form). 66 | 67 | ``` 68 | ::= 69 | :: { } 70 | ::= 71 | | 72 | | 73 | ::= "let" "=" ";" 74 | ::= "return" ";" 75 | ::= [ ";" ] 76 | 77 | 78 | ::= 79 | ::= "==" 80 | | "!=" 81 | | 82 | ::= ">" 83 | | "<" 84 | | 85 | ::= "+" 86 | | "-" 87 | | 88 | ::= "*" 89 | | "/" 90 | | 91 | ::= "-" 92 | | "!" 93 | | 94 | ::= "(" ")" 95 | | 96 | | 97 | | 98 | | 99 | 100 | 101 | ::= "(" ")" 102 | ::= 103 | | 104 | ::= "fn (" ")" 105 | ::= "{" "}" 106 | ::= { "," } 107 | ::= { "," } 108 | 109 | ::= { } 110 | ::= "0..9" 111 | ::= "a..zA..Z" 112 | ::= "true" 113 | | "false" 114 | ``` 115 | 116 | ## How to use 117 | 118 | Project developed using [Bazel](https://bazel.build/) build system. All examples below are going to use it. 119 | 120 | To start REPL just type `bazel run //java/monkey` 121 | 122 | To run tests type `bazel test //javatests/monkey/...` 123 | 124 | ## Dependencies 125 | 126 | These are libraries used to build Monkey language. If you are using prebuilt JAR archieves, you don't need 127 | them to run Monkey program via interpreter or REPL. 128 | 129 | * [Guava](https://github.com/google/guava) (Google common library) 130 | * [JUnit4](http://junit.org/junit4/) (Popular testing library) 131 | * [JUnitParams](https://github.com/Pragmatists/JUnitParams) (Parametrized tests) 132 | * [Truth](https://github.com/google/truth) (Better assertion) 133 | 134 | ## License 135 | 136 | MIT 137 | -------------------------------------------------------------------------------- /lang-spec/tiger-spec.md: -------------------------------------------------------------------------------- 1 | # Tiger Compiler Reference Manual 2 | 3 | 4 | 5 | Next: [Tiger Language Reference Manual](https://www.lrde.epita.fr/~tiger//tiger.html#Tiger-Language-Reference-Manual), Up: [(dir)](https://www.lrde.epita.fr/~tiger//dir.html#Top) [[Contents](https://www.lrde.epita.fr/~tiger//tiger.html#SEC_Contents)] 6 | 7 | 8 | 9 | # The Tiger Project 10 | 11 | This document describes the Tiger project for EPITA students as of January 23, 2018. It is available under various forms: 12 | 13 | - − [Tiger manual in a single HTML file](https://www.lrde.epita.fr/~tiger//tiger.html). 14 | - − [Tiger manual in several HTML files](https://www.lrde.epita.fr/~tiger//tiger.split). 15 | - − [Tiger manual in PDF](https://www.lrde.epita.fr/~tiger//tiger.pdf). 16 | - − [Tiger manual in text](https://www.lrde.epita.fr/~tiger//tiger.txt). 17 | - − [Tiger manual in Info](https://www.lrde.epita.fr/~tiger//tiger.info). 18 | 19 | More information is available on the [EPITA Tiger Compiler Project Home Page](http://tiger.lrde.epita.fr/). 20 | 21 | Tiger is derived from a language introduced by [Andrew Appel](http://www.cs.princeton.edu/~appel/) in his book [Modern Compiler Implementation](http://www.cs.princeton.edu/~appel/modern/). This document is by no means sufficient to produce an actual Tiger compiler, nor to understand compilation. You are **strongly** encouraged to buy and read Appel’s book: it is an *excellent* book. 22 | 23 | There are several differences with the original book, the most important being that EPITA students have to implement this compiler **in C++ and using modern object oriented programming techniques**. You ought to buy the original book, nevertheless, pay extreme attention to implementing the version of the language specified below, not that of the book. 24 | 25 | 26 | 27 | ## Table of Contents 28 | 29 | - 1 Tiger Language Reference Manual 30 | - [1.1 Lexical Specifications](https://www.lrde.epita.fr/~tiger//tiger.html#Lexical-Specifications) 31 | - [1.2 Syntactic Specifications](https://www.lrde.epita.fr/~tiger//tiger.html#Syntactic-Specifications) 32 | - 1.3 Semantics 33 | - 1.3.1 Declarations 34 | - [1.3.1.1 Type Declarations](https://www.lrde.epita.fr/~tiger//tiger.html#Type-Declarations) 35 | - [1.3.1.2 Variable Declarations](https://www.lrde.epita.fr/~tiger//tiger.html#Variable-Declarations) 36 | - [1.3.1.3 Function Declarations](https://www.lrde.epita.fr/~tiger//tiger.html#Function-Declarations) 37 | - [1.3.1.4 Method Declarations](https://www.lrde.epita.fr/~tiger//tiger.html#Method-Declarations) 38 | - [1.3.2 Expressions](https://www.lrde.epita.fr/~tiger//tiger.html#Expressions) 39 | - 2 Language Extensions 40 | - [2.1 Additional Lexical Specifications](https://www.lrde.epita.fr/~tiger//tiger.html#Additional-Lexical-Specifications) 41 | - [2.2 Additional Syntactic Specifications](https://www.lrde.epita.fr/~tiger//tiger.html#Additional-Syntactic-Specifications) 42 | - [2.3 Additional Semantics](https://www.lrde.epita.fr/~tiger//tiger.html#Additional-Semantics) 43 | - 3 Predefined Entities 44 | - [3.1 Predefined Types](https://www.lrde.epita.fr/~tiger//tiger.html#Predefined-Types) 45 | - [3.2 Predefined Functions](https://www.lrde.epita.fr/~tiger//tiger.html#Predefined-Functions) 46 | - 4 Implementation 47 | - [4.1 Invoking `tc`](https://www.lrde.epita.fr/~tiger//tiger.html#Invoking-tc) 48 | - [4.2 Errors](https://www.lrde.epita.fr/~tiger//tiger.html#Errors) 49 | - [4.3 Extensions](https://www.lrde.epita.fr/~tiger//tiger.html#Extensions) 50 | - [5 The Reference Implementation](https://www.lrde.epita.fr/~tiger//tiger.html#The-Reference-Implementation) 51 | 52 | | • [Tiger Language Reference Manual](https://www.lrde.epita.fr/~tiger//tiger.html#Tiger-Language-Reference-Manual): | | The Tiger Language Definition | 53 | | ------------------------------------------------------------ | ---- | -------------------------------------------- | 54 | | • [Language Extensions](https://www.lrde.epita.fr/~tiger//tiger.html#Language-Extensions): | | Additional constructions used internally | 55 | | • [Predefined Entities](https://www.lrde.epita.fr/~tiger//tiger.html#Predefined-Entities): | | Primitive Functions and Types | 56 | | • [Implementation](https://www.lrde.epita.fr/~tiger//tiger.html#Implementation): | | The `tc` Tiger Compiler | 57 | | • [The Reference Implementation](https://www.lrde.epita.fr/~tiger//tiger.html#The-Reference-Implementation): | | The compiler of the LRDE | 58 | | `` | | | 59 | | ` — The Detailed Node Listing — Tiger Language Reference Manual ` | | | 60 | | • [Lexical Specifications](https://www.lrde.epita.fr/~tiger//tiger.html#Lexical-Specifications): | | Tokens | 61 | | • [Syntactic Specifications](https://www.lrde.epita.fr/~tiger//tiger.html#Syntactic-Specifications): | | EBNF grammar | 62 | | • [Semantics](https://www.lrde.epita.fr/~tiger//tiger.html#Semantics): | | The meaning of Life, Universe and the rest | 63 | | `Semantics ` | | | 64 | | • [Declarations](https://www.lrde.epita.fr/~tiger//tiger.html#Declarations): | | The semantics of declarations | 65 | | • [Expressions](https://www.lrde.epita.fr/~tiger//tiger.html#Expressions): | | The semantics of expressions | 66 | | `Declarations ` | | | 67 | | • [Type Declarations](https://www.lrde.epita.fr/~tiger//tiger.html#Type-Declarations): | | Semantics of type constructions | 68 | | • [Variable Declarations](https://www.lrde.epita.fr/~tiger//tiger.html#Variable-Declarations): | | Semantics of variable definitions | 69 | | • [Function Declarations](https://www.lrde.epita.fr/~tiger//tiger.html#Function-Declarations): | | Function and primitive declaration semantics | 70 | | • [Method Declarations](https://www.lrde.epita.fr/~tiger//tiger.html#Method-Declarations): | | Method declaration semantics | 71 | | `Language Extensions ` | | | 72 | | • [Additional Lexical Specifications](https://www.lrde.epita.fr/~tiger//tiger.html#Additional-Lexical-Specifications): | | New Tokens | 73 | | • [Additional Syntactic Specifications](https://www.lrde.epita.fr/~tiger//tiger.html#Additional-Syntactic-Specifications): | | EBNF grammar extension | 74 | | • [Additional Semantics](https://www.lrde.epita.fr/~tiger//tiger.html#Additional-Semantics): | | Beyond Life, the Universe and Everything | 75 | | `Predefined Entities ` | | | 76 | | • [Predefined Types](https://www.lrde.epita.fr/~tiger//tiger.html#Predefined-Types): | | Built-in types | 77 | | • [Predefined Functions](https://www.lrde.epita.fr/~tiger//tiger.html#Predefined-Functions): | | Primitives | 78 | | `Implementation ` | | | 79 | | • [Invoking tc](https://www.lrde.epita.fr/~tiger//tiger.html#Invoking-tc): | | Command line options | 80 | | • [Errors](https://www.lrde.epita.fr/~tiger//tiger.html#Errors): | | Handling invalid input | 81 | | • [Extensions](https://www.lrde.epita.fr/~tiger//tiger.html#Extensions): | | Making extensions to your compiler | 82 | | `The Reference Implementation ` | | | 83 | 84 | ------ 85 | 86 | 87 | 88 | Next: [Language Extensions](https://www.lrde.epita.fr/~tiger//tiger.html#Language-Extensions), Previous: [Top](https://www.lrde.epita.fr/~tiger//tiger.html#Top), Up: [Top](https://www.lrde.epita.fr/~tiger//tiger.html#Top) [[Contents](https://www.lrde.epita.fr/~tiger//tiger.html#SEC_Contents)] 89 | 90 | 91 | 92 | ## 1 Tiger Language Reference Manual 93 | 94 | This document defines the Tiger language, derived from a language introduced by Andrew Appel in his “Modern Compiler Implementation” books (see [Modern Compiler Implementation](https://www.lrde.epita.fr/~tiger//assignments.html#Modern-Compiler-Implementation) in The Tiger Compiler Project). We insist so that our students buy this book, so we refrained from publishing a complete description of the language. Unfortunately, recent editions of this series of book no longer address Tiger (see [In Java - Second Edition](https://www.lrde.epita.fr/~tiger//assignments.html#In-Java-_002d-Second-Edition) in The Tiger Compiler Project), and therefore they no longer include a definition of the Tiger compiler. As a result, students were more inclined to xerox the books, rather than buying newer editions. To fight this trend, we decided to publish a complete definition of the language. Of course, the definition below is not a verbatim copy from the original language definition: these words are ours. 95 | 96 | | • [Lexical Specifications](https://www.lrde.epita.fr/~tiger//tiger.html#Lexical-Specifications): | | Tokens | 97 | | ------------------------------------------------------------ | ---- | ------------------------------------------ | 98 | | • [Syntactic Specifications](https://www.lrde.epita.fr/~tiger//tiger.html#Syntactic-Specifications): | | EBNF grammar | 99 | | • [Semantics](https://www.lrde.epita.fr/~tiger//tiger.html#Semantics): | | The meaning of Life, Universe and the rest | 100 | 101 | ------ 102 | 103 | 104 | 105 | Next: [Syntactic Specifications](https://www.lrde.epita.fr/~tiger//tiger.html#Syntactic-Specifications), Up: [Tiger Language Reference Manual](https://www.lrde.epita.fr/~tiger//tiger.html#Tiger-Language-Reference-Manual) [[Contents](https://www.lrde.epita.fr/~tiger//tiger.html#SEC_Contents)] 106 | 107 | 108 | 109 | ### 1.1 Lexical Specifications 110 | 111 | - *Keywords* 112 | 113 | ‘array’, ‘if’, ‘then’, ‘else’, ‘while’, ‘for’, ‘to’, ‘do’, ‘let’, ‘in’, ‘end’, ‘of’, ‘break’, ‘nil’, ‘function’, ‘var’, ‘type’, ‘import’ and ‘primitive’ 114 | 115 | - *Object-related keywords* 116 | 117 | The keywords ‘class’, ‘extends’, ‘method’ and ‘new’ are reserved for object-related constructions. They are valid keywords when the object extension of the language is enabled, and reserved words if this extension is disabled (i.e., they cannot be used as identifiers in object-less syntax). 118 | 119 | - *Symbols* 120 | 121 | ‘,’, ‘:’, ‘;’, ‘(’, ‘)’, ‘[’, ‘]’, ‘{’, ‘}’, ‘.’, ‘+’, ‘-’, ‘*’, ‘/’, ‘=’, ‘<>’, ‘<’, ‘<=’, ‘>’, ‘>=’, ‘&’, ‘|’, and ‘:=’ 122 | 123 | - *White characters* 124 | 125 | Space and tabulations are the only white space characters supported. Both count as a single character when tracking locations. 126 | 127 | - *End-of-line* 128 | 129 | End of lines are ‘\n\r’, and ‘\r\n’, and ‘\r’, and ‘\n’, freely intermixed. 130 | 131 | - *Strings* 132 | 133 | The strings are ANSI-C strings: enclosed by ‘"’, with support for the following escapes:‘\a’, ‘\b’, ‘\f’, ‘\n’, ‘\r’, ‘\t’, ‘\v’control characters.\numThe character which code is num in octal. Valid character codes belong to an extended (8-bit) ASCII set, i.e. values between 0 and 255 in decimal (0 and 377 in octal). num is composed of exactly three octal characters, and any invalid value is a scan error.\xnumThe character which code is num in hexadecimal (upper case or lower case or mixed). num is composed of exactly 2 hexadecimal characters. Likewise, expected values belong to an extended (8-bit) ASCII set.‘\\’A single backslash.‘\"’A double quote.\characterIf no rule above applies, this is an error.All the other characters are plain characters and are to be included in the string. In particular, multi-line strings are allowed. 134 | 135 | - *Comments* 136 | 137 | Like C comments, but can be nested:`Code /* Comment /* Nested comment */ Comment */ Code ` 138 | 139 | - *Identifiers* 140 | 141 | Identifiers start with a letter, followed by any number of alphanumeric characters plus the underscore. Identifiers are case sensitive. Moreover, the special **_main** string is also accepted as a valid identifier.`id ::= letter { letter | digit | **_** } | **_main** letter ::= **a** | **b** | **c** | **d** | **e** | **f** | **g** | **h** | **i** | **j** | **k** | **l** | **m** | **n** | **o** | **p** | **q** | **r** | **s** | **t** | **u** | **v** | **w** | **x** | **y** | **z** | **A** | **B** | **C** | **D** | **E** | **F** | **G** | **H** | **I** | **J** | **K** | **L** | **M** | **N** | **O** | **P** | **Q** | **R** | **S** | **T** | **U** | **V** | **W** | **X** | **Y** | **Z** digit ::= **0** | **1** | **2** | **3** | **4** | **5** | **6** | **7** | **8** | **9** ` 142 | 143 | - *Numbers* 144 | 145 | There are only integers in Tiger.`integer ::= digit { digit } op ::= **+** | **-** | ***** | **/** | **=** | **<>** | **>** | **<** | **>=** | **<=** | **&** | **|** ` 146 | 147 | - *Invalid characters* 148 | 149 | Any other character is invalid. 150 | 151 | ------ 152 | 153 | 154 | 155 | Next: [Semantics](https://www.lrde.epita.fr/~tiger//tiger.html#Semantics), Previous: [Lexical Specifications](https://www.lrde.epita.fr/~tiger//tiger.html#Lexical-Specifications), Up: [Tiger Language Reference Manual](https://www.lrde.epita.fr/~tiger//tiger.html#Tiger-Language-Reference-Manual) [[Contents](https://www.lrde.epita.fr/~tiger//tiger.html#SEC_Contents)] 156 | 157 | 158 | 159 | ### 1.2 Syntactic Specifications 160 | 161 | We use Extended BNF, with ‘[’ and ‘]’ for zero or once, and ‘{’ and ‘}’ for any number of repetition including zero. 162 | 163 | ``` 164 | program ::= 165 | exp 166 | | decs 167 | 168 | exp ::= 169 | # Literals. 170 | nil 171 | | integer 172 | | string 173 | 174 | # Array and record creations. 175 | | type-id [ exp ] of exp 176 | | type-id {[ id = exp { , id = exp } ] } 177 | 178 | # Object creation. 179 | | new type-id 180 | 181 | # Variables, field, elements of an array. 182 | | lvalue 183 | 184 | # Function call. 185 | | id ( [ exp { , exp }] ) 186 | 187 | # Method call. 188 | | lvalue . id ( [ exp { , exp }] ) 189 | 190 | # Operations. 191 | | - exp 192 | | exp op exp 193 | | ( exps ) 194 | 195 | # Assignment. 196 | | lvalue := exp 197 | 198 | # Control structures. 199 | | if exp then exp [else exp] 200 | | while exp do exp 201 | | for id := exp to exp do exp 202 | | break 203 | | let decs in exps end 204 | 205 | lvalue ::= id 206 | | lvalue . id 207 | | lvalue [ exp ] 208 | exps ::= [ exp { ; exp } ] 209 | 210 | decs ::= { dec } 211 | dec ::= 212 | # Type declaration. 213 | type id = ty 214 | # Class definition (alternative form). 215 | | class id [ extends type-id ] { classfields } 216 | # Variable declaration. 217 | | vardec 218 | # Function declaration. 219 | | function id ( tyfields ) [ : type-id ] = exp 220 | # Primitive declaration. 221 | | primitive id ( tyfields ) [ : type-id ] 222 | # Importing a set of declarations. 223 | | import string 224 | 225 | vardec ::= var id [ : type-id ] := exp 226 | 227 | classfields ::= { classfield } 228 | # Class fields. 229 | classfield ::= 230 | # Attribute declaration. 231 | vardec 232 | # Method declaration. 233 | | method id ( tyfields ) [ : type-id ] = exp 234 | 235 | # Types. 236 | ty ::= 237 | # Type alias. 238 | type-id 239 | # Record type definition. 240 | | { tyfields } 241 | # Array type definition. 242 | | array of type-id 243 | # Class definition (canonical form). 244 | | class [ extends type-id ] { classfields } 245 | tyfields ::= [ id : type-id { , id : type-id } ] 246 | type-id ::= id 247 | 248 | op ::= + | - | * | / | = | <> | > | < | >= | <= | & | | 249 | ``` 250 | 251 | Precedence of the op (high to low): 252 | 253 | ``` 254 | * / 255 | + - 256 | >= <= = <> < > 257 | & 258 | | 259 | ``` 260 | 261 | Comparison operators (`<`, `<=`, `=`, `<>`, `>`, `>=`) are not associative. All the remaining operators are left-associative. 262 | 263 | ------ 264 | 265 | 266 | 267 | Previous: [Syntactic Specifications](https://www.lrde.epita.fr/~tiger//tiger.html#Syntactic-Specifications), Up: [Tiger Language Reference Manual](https://www.lrde.epita.fr/~tiger//tiger.html#Tiger-Language-Reference-Manual) [[Contents](https://www.lrde.epita.fr/~tiger//tiger.html#SEC_Contents)] 268 | 269 | 270 | 271 | ### 1.3 Semantics 272 | 273 | | • [Declarations](https://www.lrde.epita.fr/~tiger//tiger.html#Declarations): | | The semantics of declarations | 274 | | ------------------------------------------------------------ | ---- | ----------------------------- | 275 | | • [Expressions](https://www.lrde.epita.fr/~tiger//tiger.html#Expressions): | | The semantics of expressions | 276 | 277 | ------ 278 | 279 | 280 | 281 | Next: [Expressions](https://www.lrde.epita.fr/~tiger//tiger.html#Expressions), Up: [Semantics](https://www.lrde.epita.fr/~tiger//tiger.html#Semantics) [[Contents](https://www.lrde.epita.fr/~tiger//tiger.html#SEC_Contents)] 282 | 283 | 284 | 285 | #### 1.3.1 Declarations 286 | 287 | - *import* 288 | 289 | An `import` clause denote the same expression where it was (recursively) replaced by the set of declarations its corresponding import-file contains. An import-file has the following syntax (see [Syntactic Specifications](https://www.lrde.epita.fr/~tiger//tiger.html#Syntactic-Specifications), for a definition of the symbols):`import-file ::= decs `Because the syntax is different, it is convenient to use another extension. We use *.tih for files to import, for instance:`/* fortytwo-fn.tih. */ function fortytwo() : int = 42 ````/* fortytwo-var.tih. */ import "fortytwo-fn.tih" var fortytwo := fortytwo() ````/* fortytwo-main.tig. */ let import "fortytwo-var.tih" in print_int(fortytwo); print("\n") end `is rigorously equivalent to:`let function fortytwo() : int = 42 var fortytwo := fortytwo() in print_int(fortytwo); print("\n") end `There can never be a duplicate-name conflict between declarations from different files. For instance:`/* 1.tih */ function one() : int = 1 ````let import "1.tih" import "1.tih" in one() = one() end `is *valid* although`let function one() : int = 1 function one() : int = 1 in one() = one() end `is not: the function `one` is defined twice in a row of function declarations.Importing a nonexistent file is an error. A imported file may not include itself, directly or indirectly. Both these errors must be diagnosed, with status set to 1 (see [Errors](https://www.lrde.epita.fr/~tiger//tiger.html#Errors)).When processing an import directive, the compiler starts looking for files in the current directory, then in all the directories of the include path, in order. 290 | 291 | - *name spaces* 292 | 293 | There are three name spaces: types, variables and functions. The original language definition features two: variables and functions share the same name space. The motivation, as noted by Sébastien Carlier, is that in FunTiger, in the second part of the book, functions can be assigned to variables:`let type a = {a : int} var a := 0 function a(a : a) : a = a{a = a.a} in a(a{a = a}) end `Three name spaces support is easier to implement. 294 | 295 | | • [Type Declarations](https://www.lrde.epita.fr/~tiger//tiger.html#Type-Declarations): | | Semantics of type constructions | 296 | | ------------------------------------------------------------ | ---- | -------------------------------------------- | 297 | | • [Variable Declarations](https://www.lrde.epita.fr/~tiger//tiger.html#Variable-Declarations): | | Semantics of variable definitions | 298 | | • [Function Declarations](https://www.lrde.epita.fr/~tiger//tiger.html#Function-Declarations): | | Function and primitive declaration semantics | 299 | | • [Method Declarations](https://www.lrde.epita.fr/~tiger//tiger.html#Method-Declarations): | | Method declaration semantics | 300 | 301 | ------ 302 | 303 | 304 | 305 | Next: [Variable Declarations](https://www.lrde.epita.fr/~tiger//tiger.html#Variable-Declarations), Up: [Declarations](https://www.lrde.epita.fr/~tiger//tiger.html#Declarations) [[Contents](https://www.lrde.epita.fr/~tiger//tiger.html#SEC_Contents)] 306 | 307 | 308 | 309 | #### 1.3.1.1 Type Declarations 310 | 311 | - *arrays* 312 | 313 | The size of the array does not belong to the type. Index of arrays starts from 0 and ends at size - 1.`let type int_array = array of int var table := int_array[100] of 0 in ... end `Arrays are initialized with the *same* instance of value. This leads to aliasing for entities with pointer semantics (strings, arrays and records).`let type rec = { val : int } type rec_arr = array of rec var table := rec_arr[2] of rec { val = 42 } in table[0].val := 51 /* Now table[1].val = 51. */ end `Use a loop to instantiate several initialization values.`let type rec = { val : int } type rec_arr = array of rec var table := rec_arr[2] of nil in for i := 0 to 1 do table[i] := rec { val = 42 }; table[0].val := 51 /* table[1].val = 42. */ end ` 314 | 315 | - *records* 316 | 317 | Records are defined by a list of fields between braces. Fields are described as “fieldname : type-id” and are separated by a coma. Field names are unique for a given record type.`let type indexed_string = {index : int, value : string} in ... end ` 318 | 319 | - *classes* 320 | 321 | (See also [Method Declarations](https://www.lrde.epita.fr/~tiger//tiger.html#Method-Declarations).)Classes define a set of attributes and methods. Empty classes are valid. Attribute declaration is like variable declaration; method declaration is similar to function declaration, but uses the keyword `method` instead of `function`.There are two ways to declare a class. The first version (known as *canonical*) uses `type`, and is similar to record and array declaration :`let type Foo = class extends Object { var bar := 42 method baz() = print("Foo.\n") } in /* ... */ end `The second version (known as *alternative* or Appel’s) doesn’t make use of `type`, but introduces classes declarations directly. This is the syntax described by Andrew Appel in his books:`let class Foo extends Object { var bar := 42 method baz() = print("Foo.\n") } in /* ... */ end `For simplicity reasons, constructs using the alternative syntax are considered as *syntactic sugar* for the canonical syntax, and are *desugared* by the parser into this first form, using the following transformation:`**class** Name [ **extends** Super ] **{** Classfields **}** => **type** Name **=** **class** [ **extends** Super ] **{** Classfields **}** `where Name, Super and Classfields are respectively the class name, the super class name and the contents of the class (attributes and methods) of the class.In the rest of the section, Appel’s form will be often used, to offer a uniform reading with his books, but remember that the *main syntax is the other one*, and *Appel’s syntax is to be desugared into the canonical one*. Declarations of class members follow the same rules as variable and function declarations: *consecutive* method declarations constitute a block (or chunk) of methods, while a block of attributes contains only *a single one* attribute declaration (several attribute declarations thus form several blocks). An extra rule holds for class members: there shall be no two attributes with the same name in the same class definition, nor two methods with the name.`let class duplicate_attrs { var a := 1 method m() = () /* Error, duplicate attribute in the same class. */ var a := 2 } class duplicate_meths { method m() = () var a := 1 /* Error, duplicate method in the same class. */ method m() = () } in end `Note that this last rule applies only to the strict scope of the class, not to the scopes of inner classes.`let type C = class { var a := 1 method m() = let type D = class { /* These members have same names as C's, but this is allowed since they are not in the same scope. */ var a := 1 method m() = () } in end } in end ` Objects of a given class are created using the keyword `new`. There are no constructors in Tiger (nor destructors), so the attributes are always initialized by the value given at their declaration.`let class Foo { var bar := 42 method baz() = print("Foo.\n") } class Empty { } var foo1 : Foo := new Foo /* As for any variable, the type annotation is optional. */ var foo2 := new Foo in /* ... */ end `The access to a member (either an attribute or a method) of an object from outside the class uses the *dotted* notation (as in C++, Java, C#, etc.). There are no visibility qualifier/restriction (i.e., all attributes of an object accessible in the current scope are accessible in read and write modes), and all its methods can be called.`let class Foo { var bar := 42 method baz() = print("Foo.\n") } var foo := new Foo in print_int(foo.bar); foo.baz() end `To access to a member (either an attribute or a method) from within the class where it is defined, use the `self` identifier (equivalent to C++’s Or Java’s *this*), which refers to the current instance of the object.`let class Point2d { var row : int := 0 var col : int := 0 method print_row() = print_int(self.row) method print_col() = print_int(self.col) method print() = ( print("("); self.print_row(); print(", "); self.print_col(); print(")") ) } in /* ... */ end `The use of `self` is mandatory to access a member of the class (or of its super class(es)) from within the class. A variable or a method not preceded by ‘`self.`’ won’t be looked up in the scope of the class.`let var a := 42 function m() = print("m()\n") class C { var a := 51 method m() = print("C.m()\n") method print_a() = (print_int(a); print("\n")) method print_self_a() = (print_int(self.a); print("\n")) method call_m() = m() method call_self_m() = self.m() } var c := new C in c.print_a(); /* Print `42'. */ c.print_self_a(); /* Print `51'. */ c.call_m(); /* Print `m()'. */ c.call_self_m() /* Print `C.m()'. */ end ``self` cannot be used outside a method definition. In this respect, `self` cannot appear in a function or a class defined within a method (except within a method defined therein, of course).`let type C = class { var a := 51 var b := self /* Invalid. */ method m () : int = let function f () : int = self.a /* Invalid. */ in f() + self.a /* Valid. */ end } var a := new C in a := self /* Invalid. */ end ``self` is a read-only variable and cannot be assigned.The Tiger language supports single inheritance thanks to the keyword `extends`, so that a class can inherit from another class declared previously, or declared in the same block of class declarations. A class with no manifest inheritance (no `extends` statement following the class name) automatically inherits from the built-in class `Object` (this feature is an extension of Appel’s object-oriented proposal).Inclusion polymorphism is supported as well: when a class Y inherits from a class X (directly or through several inheritance links), any object of Y can *be seen as* an object of type X. Hence, objects have two types: the static type, known at compile time, and the dynamic (or exact) type, known at run time, which is a subtype of (or identical to) the static type. Therefore, an object of static type Y can be assigned to a variable of type X.`let /* Manifest inheritance from Object: an A is an Object. */ class A extends Object {} /* Implicit inheritance from Object: a B is an Object. */ class B {} /* C is an A. */ class C extends A {} var a : A := new A var b : B := new B var c1 : C := new C /* When the type is not given explicitly, it is inferred from the initialization; here, C2 has static and dynamic type C. */ var c2 := new C /* This variable has static type A, but dynamic type C. */ var c3 : A := new C in /* Allowed (upcast). */ a := c1 /* Forbidden (downcast). */ /* c2 := a */ end `As stated before, a class can inherit from a class[1](https://www.lrde.epita.fr/~tiger//tiger.html#FOOT1) declared previously (and visible in the scope), or from a class declared in the same block of *type* declarations (recall that a class declaration is in fact a type declaration). Recursive inheritance is not allowed.`let /* Allowed: A declared before B. */ class A {} class B extends A {} /* Allowed: C declared before D. */ class C {} var foo := -42 class D extends C {} /* Allowed: forward inheritance, with E and F in the same block. */ class F extends E {} class E {} /* Forbidden: forward inheritance, with G and H in different blocks. */ class H extends G {} var bar := 2501 class G {} /* Forbidden: recursive inheritance. */ class I extends J {} class J extends I {} /* Forbidden: recursive inheritance and forward inheritance with K and L in different blocks. */ class K extends L {} var baz := 2097 class L extends K {} /* Forbidden: M inherits from a non-class type. */ class M extends int {} in /* ... */ end `All members from the super classes (transitive closure of the “is a” relationship) are accessible using the dotted notation, and the identifier `self` when they are used from within the class.Attribute redefinition is not allowed: a class cannot define an attribute with the same name as an inherited attribute, even if it has the same type. Regarding method overriding, see [Method Declarations](https://www.lrde.epita.fr/~tiger//tiger.html#Method-Declarations).Let us consider a block of type definitions. For each class of this block, any of its members (either attributes or methods) can reference any type introduced in scope of the block, *including the class type enclosing the considered members*.`let /* A block of types. */ class A { /* Valid forward reference to B, defined in the same block as the class enclosing this member. */ var b := new B } type t = int class B { /* Invalid forward reference to C, defined in another block (binding error). */ var c := new C } /* A block of variables. */ var v : t := 42 /* Another block of types. */ class C { } in end `However, a class member cannot reference another member defined in a class defined later in the program, in the current class or in a future class (except if the member referred to is in the same block as the referring member, hence in the same class, since a block of members cannot obviously span across two or more classes). And recall that class members can only reference previously defined class members, or members of the same block of members (e.g., a chunk of methods).`let /* A block of types. */ class X { var i := 1 /* Valid forward reference to self.o(), defined in the same block of methods. */ method m() : int = self.o() /* Invalid forward reference to self.p(), defined in another (future) block of methods (type error). */ method n() = self.p() /* Valid (backward) reference to self.i, defined earlier. */ method o() : int = self.i var j := 2 method p() = () var y := new Y /* Invalid forward reference to y.r(), defined in another (future) class (type error). */ method q() = self.y.r() } class Y { method r() = () } in end `**To put it in a nutshell**: *within a chunk of types*, forward references to classes are allowed, while forward references to members are limited to the block of members where the referring entity is defined. 322 | 323 | - *recursive types* 324 | 325 | Types can be recursive,`let type stringlist = {head : string, tail : stringlist} in ... end `or mutually recursive (if they are declared in the same chunk) in Tiger.`let type indexed_string = {index : int, value : string} type indexed_string_list = {head : indexed_string, tail : indexed_string_list} in ... end `but there shall be no cycle. This`let type a = b type b = a in ... end `is invalid. 326 | 327 | - *type equivalence* 328 | 329 | Two types are equivalent iff they are issued from the same type construction (array or record construction, or primitive type). As in C, unlike Pascal, structural equivalence is rejected.Type aliases do not build new types, hence they are equivalent.`let type a = int type b = int var a := 1 var b := 2 in a = b /* OK */ end ````let type a = {foo : int} type b = {foo : int} var va := a{foo = 1} var vb := b{foo = 2} in va = vb end ``is invalid, and must be rejected with exit status set to 5. ` 330 | 331 | ------ 332 | 333 | 334 | 335 | Next: [Function Declarations](https://www.lrde.epita.fr/~tiger//tiger.html#Function-Declarations), Previous: [Type Declarations](https://www.lrde.epita.fr/~tiger//tiger.html#Type-Declarations), Up: [Declarations](https://www.lrde.epita.fr/~tiger//tiger.html#Declarations) [[Contents](https://www.lrde.epita.fr/~tiger//tiger.html#SEC_Contents)] 336 | 337 | 338 | 339 | #### 1.3.1.2 Variable Declarations 340 | 341 | - *variables* 342 | 343 | There are two forms of variable declarations in Tiger: the short one and the long one.In the short form, only the name of the variable and the initial value of the variable are specified, the variable type is “inferred”.`let var foo := 1 /* foo is typed as an integer */ in ... end `In the long form, the type of the variable is specified. Since one cannot infer a record type for `nil`, the long form is mandated when declaring a variable initialized to `nil`.`let type foo = {foo : int} var bar : foo := nil /* Correct. */ var baz := nil /* Incorrect. */ in ... end ` 344 | 345 | ------ 346 | 347 | 348 | 349 | Next: [Method Declarations](https://www.lrde.epita.fr/~tiger//tiger.html#Method-Declarations), Previous: [Variable Declarations](https://www.lrde.epita.fr/~tiger//tiger.html#Variable-Declarations), Up: [Declarations](https://www.lrde.epita.fr/~tiger//tiger.html#Declarations) [[Contents](https://www.lrde.epita.fr/~tiger//tiger.html#SEC_Contents)] 350 | 351 | 352 | 353 | #### 1.3.1.3 Function Declarations 354 | 355 | - *functions* 356 | 357 | To declare a function, provide its return value type:`let function not (i : int) : int = if i = 0 then 1 else 0 in ... end `A procedure has no value return type.`let function print_conditional(s : string, i : int) = if i then print(s) else print("error") in print_conditional("foo", 1) end `Functions can be recursive, but mutually recursive functions must be in the same sequence of function declarations (no other declaration should be placed between them).See the semantics of function calls for the argument passing policy (see [Expressions](https://www.lrde.epita.fr/~tiger//tiger.html#Expressions)). 358 | 359 | - *primitive* 360 | 361 | A primitive is a built-in function, i.e., a function which body is provided by the runtime system. See [Predefined Functions](https://www.lrde.epita.fr/~tiger//tiger.html#Predefined-Functions), for the list of standard primitives. Aside from the lack of body, and henceforth the absence of translation, primitive declarations behave as function declarations. They share the same name space, and obey the same duplicate-name rule. For instance:`let primitive one() : int function one() : int = 1 in ... end ``is invalid, and must be rejected with exit status set to 4. ` 362 | 363 | ------ 364 | 365 | 366 | 367 | Previous: [Function Declarations](https://www.lrde.epita.fr/~tiger//tiger.html#Function-Declarations), Up: [Declarations](https://www.lrde.epita.fr/~tiger//tiger.html#Declarations) [[Contents](https://www.lrde.epita.fr/~tiger//tiger.html#SEC_Contents)] 368 | 369 | 370 | 371 | #### 1.3.1.4 Method Declarations 372 | 373 | - *Overriding methods* 374 | 375 | When a method in a class overrides a method of a super class, the overridden method (in the super class) is no longer accessible. Dynamic dispatch is performed, using the exact type of the object (known at run time) to select the method according to this exact type.However, the interface of the accessible attributes and callable methods remains restricted to the static interface (i.e., the one of the static type of the object).`let class Shape { /* Position. */ var row := 0 var col := 0 method print_row() = (print("row = "); print_int(self.row)) method print_col() = (print("col = "); print_int(self.col)) method print() = ( print("Shape = { "); self.print_row(); print(", "); self.print_col(); print(" }") ) } class Circle extends Shape { var radius := 1 method print_radius() = (print("radius = "); print_int(self.radius)) /* Overridden method. */ method print() = ( print("Circle = { "); self.print_row(); print(", "); self.print_col(); print(", "); self.print_radius(); print(" }") ) } /* C has static type Shape, and dynamic (exact) type Circle. */ var c : Shape := new Circle in /* Dynamic dispatch to Circle's print method. */ c.print(); /* Allowed. */ c.print_row() /* Forbidden: `print_radius' is not a member of Shape (nor of its super class(es)). */ /* c.print_radius() */ end ` 376 | 377 | - *Method invariance* 378 | 379 | Methods are invariant in Tiger: each redefinition of a method in a subclass shall have the exact same signature as the original (overridden) method. This invariance applies tothe number of arguments,the types of the arguments,the type of the return value[2](https://www.lrde.epita.fr/~tiger//tiger.html#FOOT2).`let class Food {} class Grass extends Food {} class Animal { method eat(f : Food) = () } class Cow extends Animal { /* Invalid: methods shall be invariant. */ method eat(g : Grass) = () } in end ` 380 | 381 | ------ 382 | 383 | 384 | 385 | Previous: [Declarations](https://www.lrde.epita.fr/~tiger//tiger.html#Declarations), Up: [Semantics](https://www.lrde.epita.fr/~tiger//tiger.html#Semantics) [[Contents](https://www.lrde.epita.fr/~tiger//tiger.html#SEC_Contents)] 386 | 387 | 388 | 389 | #### 1.3.2 Expressions 390 | 391 | - *L-values* 392 | 393 | The ‘l-values’ (whose value can be read or changed) are: elements of arrays, fields of records, instances of classes, arguments and variables. 394 | 395 | - *Valueless expressions* 396 | 397 | Some expressions have no value: procedure calls, assignments, `if`s with no `else` clause, loops and `break`. Empty sequences (‘()’) and `let`s with an empty body are also valueless. 398 | 399 | - *Nil* 400 | 401 | The reserved word `nil` refers to a value from a `record` or a `class` type. Do not use `nil` where its type cannot be determined.`let type any_record = {any : int} var nil_var : any_record := nil function nil_test(parameter : any_record) : int = ... var invalid := nil /* no type, invalid */ in if nil <> nil_var then ... if nil_test(nil_var) then ... if nil = nil then ... /* no type, invalid */ end ` 402 | 403 | - *Integers* 404 | 405 | An integer literal is a series of decimal digits (therefore it is non-negative). Since the compiler targets 32-bit architectures, since it needs to handle signed integers, a literal integer value must fit in a signed 32-bit integer. Any other integer value is a scanner error. 406 | 407 | - *Booleans* 408 | 409 | There is no Boolean type in Tiger: they are encoded as integers, with the same semantics as in C, i.e., 0 is the only value standing for “false”, anything else stands for “true”. 410 | 411 | - *Strings* 412 | 413 | A string constant is a possibly empty series of printable characters, spaces or escapes sequences (see [Lexical Specifications](https://www.lrde.epita.fr/~tiger//tiger.html#Lexical-Specifications)) enclosed between double quotes.`let var s := "\t\124\111\107\105\122\n" in print(s) end ` 414 | 415 | - *Record instantiation* 416 | 417 | A record instantiation must define the value of all the fields and in the same order as in the definition of the record type. 418 | 419 | - *Class instantiation* 420 | 421 | An object is created with `new`. There are no constructors in Tiger, so `new` takes only one operand, the name of the type to instantiate. 422 | 423 | - *Function call* 424 | 425 | Function arguments are evaluated from the left to the right. Arrays and records arguments are passed by reference, strings and integer are passed by value.The following example:`let type my_record = {value : int} function reference(parameter : my_record) = parameter.value := 42 function value(parameter : string) = parameter := "Tiger is the best language\n" var rec1 := my_record{value = 1} var str := "C++ rulez" in reference(rec1); print_int(rec1.value); print("\n"); value(str); print(str); print("\n") end `results in:`42 C++ rulez ` 426 | 427 | - *Boolean operators* 428 | 429 | Tiger Boolean operators normalize their result to 0/1. For instance, because `&` and `|` can be implemented as syntactic sugar, one could easily make ‘123 | 456’ return ‘1’ or ‘123’: make them return ‘1’. Andrew Appel does not enforce this for ‘&’ and ‘|’; we do, so that the following program has a well defined behavior:`print_int("0" < "9" | 42) ` 430 | 431 | - *Arithmetic* 432 | 433 | Arithmetic expressions only apply on integers and return integers. Available operators in Tiger are : +,-,* and /. 434 | 435 | - *Comparison* 436 | 437 | Comparison operators (‘=’, ‘<>’, and ‘<=’, ‘<’, ‘>=’, ‘>’) return a Boolean value.*Integer and string comparison*All the comparison operators apply to pairs of strings and pairs of integers, with obvious semantics.*String comparison*Comparison of strings is based on the lexicographic order.*Array and record comparison*Pairs of arrays and pairs of records *of the same type* can be compared for equality (‘=’) and inequality (‘<>’). Identity equality applies, i.e., an array or a record is only equal to itself (shallow equality), regardless of the contents equality (deep equality). The value `nil` can be compared against a value which type is that of a record or a class, e.g. ‘nil = nil’ is invalid.Arrays, records and objects cannot be ordered: ‘<’, ‘>’, ‘<=’, ‘>=’ are valid only for pairs of strings or integers.*Void comparison*In conformance with A. Appel’s specifications, any two void entities are equal. 438 | 439 | - *Assignment* 440 | 441 | Assignments yield no value. The following code is syntactically correct, but type incorrect:`let var foo := 1 var bar := 1 in foo := (bar := 2) + 1 end `Note that the following code is valid:`let var void1 := () var void2 := () var void3 := () in void1 := void2 := void3 := () end ` 442 | 443 | - *Array and record assignment* 444 | 445 | Array and record assignments are shallow, not deep, copies. Therefore aliasing effects arise: if an array or a record variable a is assigned another variable b of the same type, then changes on b will affect a and vice versa.`let type bar = {foo : int} var rec1 := bar{foo = 1} var rec2 := bar{foo = 2} in print_int(rec1.foo); print(" is the value of rec1\n"); print_int(rec2.foo); print(" is the value of rec2\n"); rec1 := rec2; rec2.foo = 42; print_int(rec1.foo); print(" is the new value of rec1\n") end ` 446 | 447 | - *Polymorphic (object) assignment* 448 | 449 | Upcasts are valid for objects because of inclusion polymorphism.`let class A {} class B extends A {} var a := new A var b := new B in a := b end `Upcasts can be performed when defining a new object variable, by forcing the type of the declared variable to a super class of the actual object.`let class C {} class D extends C {} var c : C := new D in end `Tiger doesn’t provide a downcast feature performing run time type identification (RTTI), like C++’s `dynamic_cast`.`let class E {} class F extends E {} var e : E := new F var f := new F in /* Invalid: downcast. */ f := e end ` 450 | 451 | - *Polymorphic (object) branching* 452 | 453 | Upcast are performed when branching between two class instantiations.Since every class inherits from `Object`, you will always find a common root.`let class A {} class B extends A {} in if 1 then new A else new B end ` 454 | 455 | - *Sequences* 456 | 457 | A sequence is a possibly empty series of expressions separated by semicolons and enclosed by parenthesis. By convention, there are no sequences of a single expression (see the following item). The sequence is evaluated from the left to the right. The value of the whole sequence is that of its last expression.`let var a := 1 in a := ( print("first exp to display\n"); print("second exp to display\n"); a := a + 1; a ) + 42; print("the last value of a is : "); print_int(a); print("\n") end ` 458 | 459 | - *Parentheses* 460 | 461 | Parentheses enclosing a single expression enforce syntactic grouping. 462 | 463 | - *Lifetime* 464 | 465 | Records and arrays have infinite lifetime: their values lasts forever even if the scope of their creation is left.`let type bar = {foo : int} var rec1 := bar{foo = 1} in rec1 := let var rec2 := bar{foo = 42} in rec2 end; print_int(rec1.foo); print("\n") end ` 466 | 467 | - *if-then-else* 468 | 469 | In an if-expression:`if exp1 then exp2 else exp3 `exp1 is typed as an integer, exp2 and exp3 must have the same type which will be the type of the entire structure. The resulting type cannot be that of `nil`. 470 | 471 | - *if-then* 472 | 473 | In an if-expression:`if exp1 then exp2 `exp1 is typed as an integer, and exp2 must have no value. The whole expression has no value either. 474 | 475 | - *while* 476 | 477 | In a while-expression:`while exp1 do exp2 `exp1 is typed as an integer, exp2 must have no value. The whole expression has no value either. 478 | 479 | - *for* 480 | 481 | The following `for` loop`for id := exp1 to exp2 do exp3 `introduces a fresh variable, id, which ranges from the value of exp1 to that of exp2, inclusive, by steps of 1. The scope of id is restricted to exp3. In particular, id cannot appear in exp1 nor exp2. The variable id cannot be assigned to. The type of both exp1 and exp2 is integer, they can range from the minimal to the maximal integer values. The body exp3 and the whole loop have no value. 482 | 483 | - *break* 484 | 485 | A break terminates the nearest enclosing loop (`while` or `for`). A break must be enclosed by a loop. A break cannot appear inside a definition (e.g., between `let` and `in`), except if it is enclosed by a loop, of course. 486 | 487 | - *let* 488 | 489 | In the let-expression:`let decs in exps end `decs is a sequence of declaration and exps is a sequence of expressions separated by a semi-colon. The whole expression has the value of exps. 490 | 491 | ------ 492 | 493 | 494 | 495 | Next: [Predefined Entities](https://www.lrde.epita.fr/~tiger//tiger.html#Predefined-Entities), Previous: [Tiger Language Reference Manual](https://www.lrde.epita.fr/~tiger//tiger.html#Tiger-Language-Reference-Manual), Up: [Top](https://www.lrde.epita.fr/~tiger//tiger.html#Top) [[Contents](https://www.lrde.epita.fr/~tiger//tiger.html#SEC_Contents)] 496 | 497 | 498 | 499 | ## 2 Language Extensions 500 | 501 | Numerous extensions of the Tiger language are defined above. These extensions are *not* accessible to the user: if he uses one of them in a Tiger program, the compiler must reject it. They are used internally by the compiler itself, for example to desugar using concrete syntax. A special flag of the parser must be turned on to enable them. 502 | 503 | | • [Additional Lexical Specifications](https://www.lrde.epita.fr/~tiger//tiger.html#Additional-Lexical-Specifications): | | New Tokens | 504 | | ------------------------------------------------------------ | ---- | ---------------------------------------- | 505 | | • [Additional Syntactic Specifications](https://www.lrde.epita.fr/~tiger//tiger.html#Additional-Syntactic-Specifications): | | EBNF grammar extension | 506 | | • [Additional Semantics](https://www.lrde.epita.fr/~tiger//tiger.html#Additional-Semantics): | | Beyond Life, the Universe and Everything | 507 | 508 | ------ 509 | 510 | 511 | 512 | Next: [Additional Syntactic Specifications](https://www.lrde.epita.fr/~tiger//tiger.html#Additional-Syntactic-Specifications), Up: [Language Extensions](https://www.lrde.epita.fr/~tiger//tiger.html#Language-Extensions) [[Contents](https://www.lrde.epita.fr/~tiger//tiger.html#SEC_Contents)] 513 | 514 | 515 | 516 | ### 2.1 Additional Lexical Specifications 517 | 518 | Additional keywords and identifiers. 519 | 520 | - ‘_cast’ 521 | 522 | Used to cast an expression or a l-value to a given type. 523 | 524 | - ‘_decs’, ‘_exp’, ‘_lvalue’, ‘_namety’ 525 | 526 | These keywords are used to plug an existing AST into an AST being built by the parser. There is a keyword per type of pluggable AST (list of declarations, expression, l-value, type name). 527 | 528 | - Reserved identifiers 529 | 530 | They start with an underscore, and use the same letters as standard identifiers. These symbols are used internally by the compiler to name or rename entities. Note that **_main** is still a valid identifier, not a reserved one.`reserved-id ::= **_** { letter | digit | **_** } ` 531 | 532 | ------ 533 | 534 | 535 | 536 | Next: [Additional Semantics](https://www.lrde.epita.fr/~tiger//tiger.html#Additional-Semantics), Previous: [Additional Lexical Specifications](https://www.lrde.epita.fr/~tiger//tiger.html#Additional-Lexical-Specifications), Up: [Language Extensions](https://www.lrde.epita.fr/~tiger//tiger.html#Language-Extensions) [[Contents](https://www.lrde.epita.fr/~tiger//tiger.html#SEC_Contents)] 537 | 538 | 539 | 540 | ### 2.2 Additional Syntactic Specifications 541 | 542 | - *Grammar extensions* 543 | 544 | In addition to the rules of the standard Tiger grammar (see [Syntactic Specifications](https://www.lrde.epita.fr/~tiger//tiger.html#Syntactic-Specifications)), extensions adds the following productions.`*# A list of decs metavariable* decs ::= **_decs** **(** integer **)** decs ````exp ::= *# Cast of an expression to a given type* **_cast** **(** exp **,** ty **)** *# An expression metavariable* | **_exp** **(** integer **)** ````lvalue ::= *# Cast of a l-value to a given type* **_cast** **(** lvalue **,** ty **)** *# A l-value metavariable* | **_lvalue** **(** integer **)** ````*# A type name metavariable* type-id ::= **_namety** **(** integer **)** ` 545 | 546 | - *Metavariables* 547 | 548 | The ‘_decs’, ‘_exp’, ‘_lvalue’, ‘_namety’ keywords are used as metavariables, i.e., they are names attached to an (already built) AST. They *don’t create new AST nodes*, but are used to *retrieve existing nodes*, stored previously. For instance, upon an `_exp(51)` statement, the parser fetches the tree attached to the metavariable 51 (an expression) from the parsing context (see the implementation for details). 549 | 550 | ------ 551 | 552 | 553 | 554 | Previous: [Additional Syntactic Specifications](https://www.lrde.epita.fr/~tiger//tiger.html#Additional-Syntactic-Specifications), Up: [Language Extensions](https://www.lrde.epita.fr/~tiger//tiger.html#Language-Extensions) [[Contents](https://www.lrde.epita.fr/~tiger//tiger.html#SEC_Contents)] 555 | 556 | 557 | 558 | ### 2.3 Additional Semantics 559 | 560 | - *Casts* 561 | 562 | A `_cast` statement changes the type of an expression or an l-lvalue to a given type. Beware that the type-checker is forced to accept the new type as is, and must trust the programmer about the new semantics of the expression/l-value. Bad casts can raise errors in the next stages of the back-end, or even lead to invalid output code.Casts work both on expressions and l-values. For instance, these are valid casts:`_cast("a", int) ````_cast(a_string, int) := 42 `(Although these examples could produce code with a strange behavior at execution time.)Casts are currently only used in concrete syntax transformations inside the bounds checking extension and, as any language extension, are forbidden in standard Tiger programs. 563 | 564 | ------ 565 | 566 | 567 | 568 | Next: [Implementation](https://www.lrde.epita.fr/~tiger//tiger.html#Implementation), Previous: [Language Extensions](https://www.lrde.epita.fr/~tiger//tiger.html#Language-Extensions), Up: [Top](https://www.lrde.epita.fr/~tiger//tiger.html#Top) [[Contents](https://www.lrde.epita.fr/~tiger//tiger.html#SEC_Contents)] 569 | 570 | 571 | 572 | ## 3 Predefined Entities 573 | 574 | These entities are *predefined*, i.e., they are available when you start the Tiger compiler, but a Tiger program may redefine them. 575 | 576 | | • [Predefined Types](https://www.lrde.epita.fr/~tiger//tiger.html#Predefined-Types): | | Built-in types | 577 | | ------------------------------------------------------------ | ---- | -------------- | 578 | | • [Predefined Functions](https://www.lrde.epita.fr/~tiger//tiger.html#Predefined-Functions): | | Primitives | 579 | 580 | ------ 581 | 582 | 583 | 584 | Next: [Predefined Functions](https://www.lrde.epita.fr/~tiger//tiger.html#Predefined-Functions), Up: [Predefined Entities](https://www.lrde.epita.fr/~tiger//tiger.html#Predefined-Entities) [[Contents](https://www.lrde.epita.fr/~tiger//tiger.html#SEC_Contents)] 585 | 586 | 587 | 588 | ### 3.1 Predefined Types 589 | 590 | There are three predefined types: 591 | 592 | - ‘int’ 593 | 594 | which is the type of all the literal integers. 595 | 596 | - ‘string’ 597 | 598 | which is the type of all the literal strings. 599 | 600 | - ‘Object’ 601 | 602 | which is the super class type on top of every class hierarchy (i.e., the top-most super class in the transitive closure of the generalization relationship). 603 | 604 | ------ 605 | 606 | 607 | 608 | Previous: [Predefined Types](https://www.lrde.epita.fr/~tiger//tiger.html#Predefined-Types), Up: [Predefined Entities](https://www.lrde.epita.fr/~tiger//tiger.html#Predefined-Entities) [[Contents](https://www.lrde.epita.fr/~tiger//tiger.html#SEC_Contents)] 609 | 610 | 611 | 612 | ### 3.2 Predefined Functions 613 | 614 | Some runtime function may fail if some assertions are not fulfilled. In that case, the program must exit with a properly labeled error message, and with exit code 120. The error messages must follow the standard. Any difference, in better or worse, is a failure to comply with the (this) Tiger Reference Manual. 615 | 616 | - string: **chr** *(code : int)* 617 | 618 | Return the one character long string containing the character which code is code. If code does not belong to the range [0..255], raise a runtime error: ‘chr: character out of range’. 619 | 620 | - string: **concat** *(first: string, second: string)* 621 | 622 | Concatenate first and second. 623 | 624 | - void: **exit** *(status: int)* 625 | 626 | Exit the program with exit code status. 627 | 628 | - void: **flush** *()* 629 | 630 | Flush the output buffer. 631 | 632 | - string: **getchar** *()* 633 | 634 | Read a character on input. Return an empty string on an end of file. 635 | 636 | - int: **not** *(boolean: int)* 637 | 638 | Return 1 if boolean = 0, else return 0. 639 | 640 | - int: **ord** *(string: string)* 641 | 642 | Return the ascii code of the first character in string and -1 if the given string is empty. 643 | 644 | - void: **print** *(string: string)* 645 | 646 | Print string on the standard output. 647 | 648 | - void: **print_err** *(string: string)* 649 | 650 | Note: this is an EPITA extension. Same as `print`, but the output is written to the standard error. 651 | 652 | - void: **print_int** *(int: int)* 653 | 654 | Note: this is an EPITA extension. Output int in its decimal canonical form (equivalent to ‘%d’ for `printf`). 655 | 656 | - int: **size** *(string: string)* 657 | 658 | Return the size in characters of the string. 659 | 660 | - int: **strcmp** *(a: string, b: string)* 661 | 662 | Note: this is an EPITA extension. Compare the strings a and b: return -1 if a < b, 0 if equal, and 1 otherwise. 663 | 664 | - int: **streq** *(a: string, b: string)* 665 | 666 | Note: this is an EPITA extension. Return 1 if the strings a and b are equal, 0 otherwise. Often faster than `strcmp` to test string equality. 667 | 668 | - string: **substring** *(string: string, first: int, length: int)* 669 | 670 | Return a string composed of the characters of string starting at the first character (0 being the origin), and composed of length characters (i.e., up to and including the character first + length - 1).Let size be the size of the string, the following assertions must hold:0 <= first0 <= lengthfirst + length <= sizeotherwise a runtime failure is raised: ‘substring: arguments out of bounds’. 671 | 672 | ------ 673 | 674 | 675 | 676 | Next: [The Reference Implementation](https://www.lrde.epita.fr/~tiger//tiger.html#The-Reference-Implementation), Previous: [Predefined Entities](https://www.lrde.epita.fr/~tiger//tiger.html#Predefined-Entities), Up: [Top](https://www.lrde.epita.fr/~tiger//tiger.html#Top) [[Contents](https://www.lrde.epita.fr/~tiger//tiger.html#SEC_Contents)] 677 | 678 | 679 | 680 | ## 4 Implementation 681 | 682 | | • [Invoking tc](https://www.lrde.epita.fr/~tiger//tiger.html#Invoking-tc): | | Command line options | 683 | | ------------------------------------------------------------ | ---- | ---------------------------------- | 684 | | • [Errors](https://www.lrde.epita.fr/~tiger//tiger.html#Errors): | | Handling invalid input | 685 | | • [Extensions](https://www.lrde.epita.fr/~tiger//tiger.html#Extensions): | | Making extensions to your compiler | 686 | 687 | ------ 688 | 689 | 690 | 691 | Next: [Errors](https://www.lrde.epita.fr/~tiger//tiger.html#Errors), Up: [Implementation](https://www.lrde.epita.fr/~tiger//tiger.html#Implementation) [[Contents](https://www.lrde.epita.fr/~tiger//tiger.html#SEC_Contents)] 692 | 693 | 694 | 695 | ### 4.1 Invoking `tc` 696 | 697 | Synopsis: 698 | 699 | ``` 700 | tc option… file 701 | ``` 702 | 703 | where file can be ‘-’, denoting the standard input. 704 | 705 | Global options are: 706 | 707 | - -? 708 | 709 | - --help 710 | 711 | Display the help message, and exit successfully. 712 | 713 | - --version 714 | 715 | Display the version, and exit successfully. 716 | 717 | - --task-list 718 | 719 | List the registered tasks. 720 | 721 | - --task-selection 722 | 723 | Report the order in which the tasks will be run. 724 | 725 | 726 | 727 | The options related to the file library (TC-1) are: 728 | 729 | - -p 730 | 731 | - --library-prepend 732 | 733 | Prepend a directory to include path. 734 | 735 | - -P 736 | 737 | - --library-append 738 | 739 | Append a directory to include path. 740 | 741 | - --library-display 742 | 743 | Report the include search path. 744 | 745 | 746 | 747 | The options related to scanning and parsing (TC-1) are: 748 | 749 | - --scan-trace 750 | 751 | Enable Flex scanners traces. 752 | 753 | - --parse-trace 754 | 755 | Enable Bison parsers traces. 756 | 757 | - --parse 758 | 759 | Parse the file given as argument (objects forbidden). 760 | 761 | - --prelude=prelude 762 | 763 | Load the definitions of the file prelude before the actual argument. The result is equivalent to parsing:`let import "prelude" in /* The argument file. */ end `To disable any prelude file, use no-prelude. The default value is `builtin`, denoting the builtin prelude. 764 | 765 | - -X 766 | 767 | - --no-prelude 768 | 769 | Don’t include prelude. 770 | 771 | 772 | 773 | The options related to the AST (TC-2) are: 774 | 775 | - -o 776 | 777 | - --object 778 | 779 | Enable object constructs of the language (class and method declarations, object creation, method calls, etc.). 780 | 781 | - --object-parse 782 | 783 | Same as --object --parse, i.e. parse the file given as argument, allowing objects. 784 | 785 | - -A 786 | 787 | - --ast-display 788 | 789 | Display the AST. 790 | 791 | - -D 792 | 793 | - --ast-delete 794 | 795 | Reclaim the memory allocated for the AST. 796 | 797 | 798 | 799 | The options related to escapes computation (TC-3) are: 800 | 801 | - --bound 802 | 803 | Make sure bindings (regular or taking overloading or objects constructs into account) are computed. 804 | 805 | - -b 806 | 807 | - --bindings-compute 808 | 809 | Bind the name uses to their definitions (objects forbidden). 810 | 811 | - -B 812 | 813 | - --bindings-display 814 | 815 | Enable the bindings display in the next --ast-display invocation. This option does not imply --bindings-compute. 816 | 817 | - --object-bindings-compute 818 | 819 | Bind the name uses to their definitions, allowing objects. consistency. 820 | 821 | 822 | 823 | The options related to the renaming to unique identifiers (TC-R) are: 824 | 825 | - --rename 826 | 827 | Rename identifiers (objects forbidden). 828 | 829 | 830 | 831 | The options related to escapes computation (TC-E) are: 832 | 833 | - -e 834 | 835 | - --escapes-compute 836 | 837 | Compute the escapes. 838 | 839 | - -E 840 | 841 | - --escapes-display 842 | 843 | Enable the escape display. This option does not imply --escapes-compute, so that it is possible to check that the defaults (everybody escapes) are properly implemented. Pass -A afterward to see its result. 844 | 845 | 846 | 847 | The options related to type checking (TC-4) are: 848 | 849 | - -T 850 | 851 | - --typed 852 | 853 | Make sure types (regular or taking overloading or objects constructs into account) are computed. 854 | 855 | - --types-compute 856 | 857 | Compute and check (regular) types (objects forbidden). 858 | 859 | - --object-types-compute 860 | 861 | Compute and check (regular) types, allowing objects. 862 | 863 | 864 | 865 | The options related to desugaring (TC-D) are: 866 | 867 | - --desugar-for 868 | 869 | Enable the translation of `for` loops into `while` loops. 870 | 871 | - --desugar-string-cmp 872 | 873 | Enable the desugaring of string comparisons. 874 | 875 | - --desugared 876 | 877 | Make sure syntactic sugar (regular or taking overloading into account) has been removed from the AST. 878 | 879 | - --desugar 880 | 881 | Remove syntactic sugar from the AST. Desired translations must be enabled beforehand (e.g. with --desugar-for or --desugar-string-cmp). 882 | 883 | - --overfun-desugar 884 | 885 | Like --desugar but with support for overloaded functions (see TC-A). 886 | 887 | 888 | 889 | The options related to the inlining optimization (TC-I) are: 890 | 891 | - --inline 892 | 893 | Inline bodies of (non overloaded) functions at call sites. 894 | 895 | - --overfun-inline 896 | 897 | Inline bodies of functions (overloaded or not) at call sites. 898 | 899 | - --prune 900 | 901 | Remove unused (non overloaded) functions. 902 | 903 | - --overfun-prune 904 | 905 | Remove unused functions (overloaded or not). 906 | 907 | 908 | 909 | The options related to the bounds checking instrumentation (TC-B) are: 910 | 911 | - --bounds-checks-add 912 | 913 | Add dynamic bounds checks. 914 | 915 | - --overfun-bounds-checks-add 916 | 917 | Add dynamic bounds checks, with support for overloading. 918 | 919 | 920 | 921 | The options related to overloading support (TC-A) are: 922 | 923 | - --overfun-bindings-compute 924 | 925 | Binding variables, types, and breaks as usual, by bind function calls to the set of function definitions baring the same name. 926 | 927 | - -O 928 | 929 | - --overfun-types-compute 930 | 931 | Type-check and resolve (bind) overloaded function calls. Implies --overfun-bindings-compute. 932 | 933 | 934 | 935 | The options related to the desugaring of object constructs (TC-O) are: 936 | 937 | - --object-desugar 938 | 939 | Translate object constructs from the program into their non object counterparts, i.e., transform a Tiger program into a Panther one. 940 | 941 | 942 | 943 | The options related to the high level intermediate representation (TC-5) are: 944 | 945 | - --hir-compute 946 | 947 | Translate to HIR (objects forbidden). Implies --typed. 948 | 949 | - -H 950 | 951 | - --hir-display 952 | 953 | Display the high level intermediate representation. Implies --hir-compute. 954 | 955 | 956 | 957 | The options related to the LLVM IR translation (TC-L) are: 958 | 959 | - --llvm-compute 960 | 961 | Translate to LLVM IR. 962 | 963 | - --llvm-runtime-display 964 | 965 | Enable runtime displaying along with the LLVM IR. 966 | 967 | - --llvm-display 968 | 969 | Display the LLVM IR. 970 | 971 | 972 | 973 | The options related to the low level intermediate representation (TC-6) are: 974 | 975 | - --canon-trace 976 | 977 | Trace the canonicalization of HIR to LIR. 978 | 979 | - --canon-compute 980 | 981 | Canonicalize the LIR fragments. 982 | 983 | - -C 984 | 985 | - --canon-display 986 | 987 | Display the canonicalized intermediate representation *before* basic blocks and traces computation. Implies --lir-compute. It is convenient to determine whether a failure is due to canonicalization, or traces. 988 | 989 | - --traces-trace 990 | 991 | Trace the basic blocks and traces canonicalization of HIR to LIR. 992 | 993 | - --traces-compute 994 | 995 | Compute the basic blocks from canonicalized HIR fragments. Implies --canon-compute. 996 | 997 | - --lir-compute 998 | 999 | Translate to LIR. Implies --traces-compute. Actually, it is nothing but a nice looking alias for the latter. 1000 | 1001 | - -L 1002 | 1003 | - --lir-display 1004 | 1005 | Display the low level intermediate representation. Implies --lir-compute. 1006 | 1007 | 1008 | 1009 | The options related to the instruction selection (TC-7) are: 1010 | 1011 | - --inst-compute 1012 | 1013 | Convert from LIR to pseudo assembly with temporaries. Implies --lir-compute. 1014 | 1015 | - -I 1016 | 1017 | - --inst-display 1018 | 1019 | Display the pseudo assembly, (without the runtime prologue). Implies --inst-compute. 1020 | 1021 | - -R 1022 | 1023 | - --runtime-display 1024 | 1025 | Display the assembly runtime prologue for the current target. 1026 | 1027 | 1028 | 1029 | The options related to the liveness information (TC-8) are: 1030 | 1031 | - -F 1032 | 1033 | - --flowgraphs-dump 1034 | 1035 | Save each function flow graph in a Graphviz file. Implies --inst-compute. 1036 | 1037 | - -V 1038 | 1039 | - --liveness-dump 1040 | 1041 | Save each function flow graph enriched with liveness information in a Graphviz file. Implies --inst-compute. 1042 | 1043 | - -N 1044 | 1045 | - --interference-dump 1046 | 1047 | Save each function interference graph in a Graphviz file. Implies --inst-compute. 1048 | 1049 | 1050 | 1051 | The options related to the target are: 1052 | 1053 | - --callee-save=num 1054 | 1055 | - --caller-save=num 1056 | 1057 | Set the maximum number of callee/caller save registers to num, a positive number. Note that (currently) this does not reset the current target, hence to actually change the behavior, one needs ‘--callee-save=0 --target-mips’. 1058 | 1059 | - --target-mips 1060 | 1061 | Set the target to Mips. 1062 | 1063 | - --target-ia32 1064 | 1065 | This optional flag sets the target to IA-32. 1066 | 1067 | - --target-default 1068 | 1069 | If no target is selected, select Mips. This option is triggered by all the options that need a target. 1070 | 1071 | - --target-display 1072 | 1073 | Report information about the current target. 1074 | 1075 | 1076 | 1077 | The options related to the register allocation are: 1078 | 1079 | - --asm-coalesce-disable 1080 | 1081 | Disable coalescence. 1082 | 1083 | - --asm-trace 1084 | 1085 | Trace register allocation. 1086 | 1087 | - -s 1088 | 1089 | - --asm-compute 1090 | 1091 | Allocate the registers. 1092 | 1093 | - -S 1094 | 1095 | - --asm-display 1096 | 1097 | Display the final assembler, runtime included. 1098 | 1099 | ------ 1100 | 1101 | 1102 | 1103 | Next: [Extensions](https://www.lrde.epita.fr/~tiger//tiger.html#Extensions), Previous: [Invoking tc](https://www.lrde.epita.fr/~tiger//tiger.html#Invoking-tc), Up: [Implementation](https://www.lrde.epita.fr/~tiger//tiger.html#Implementation) [[Contents](https://www.lrde.epita.fr/~tiger//tiger.html#SEC_Contents)] 1104 | 1105 | 1106 | 1107 | ### 4.2 Errors 1108 | 1109 | Errors must be reported on the standard error output. The exit status and the standard error output must be consistent: the exit status is 0 if and only if there is no output at all on the standard error output. There are actually some exceptions: when tracing (scanning, parsing, etc.) are enabled. 1110 | 1111 | Compile errors must be reported on the standard error flow with precise error location. The format of the error output must exactly be 1112 | 1113 | ``` 1114 | location: error message 1115 | ``` 1116 | 1117 | where the location includes the file name, initial position, and final position. There is no fixed set of error messages. 1118 | 1119 | Examples include: 1120 | 1121 | ``` 1122 | $ echo "1 + + 2" | ./tc - 1123 | error→standard input:1.4: syntax error, unexpected "+" 1124 | error→Parsing Failed 1125 | ``` 1126 | 1127 | and 1128 | 1129 | ``` 1130 | $ echo "1 + () + 2" | ./tc -T - 1131 | error→standard input:1.0-5: type mismatch 1132 | error→ right operand type: void 1133 | error→ expected type: int 1134 | ``` 1135 | 1136 | 1137 | 1138 | **Warning:** The symbol error→ is not part of the actual output. It is only used in this document to highlight that the message is produced on the standard error flow. Do not include it as part of the compiler’s messages. The same applied to ⇒. 1139 | 1140 | 1141 | 1142 | The compiler exit value should reflect faithfully the compilation status. The possible values are: 1143 | 1144 | - 0 1145 | 1146 | Everything is all right. 1147 | 1148 | - 1 1149 | 1150 | Some error which does not fall into the other categories occurred. For instance, `malloc` or `fopen` failed, a file is missing etc.An unsupported option must cause `tc` to exit 64 (`EX_USAGE`) even if related to a stage option otherwise these optional features will be tested, and it will most probably have 0. For instance, a TC-5 delivery that does not support bounds checking must not accept --bounds-checking. 1151 | 1152 | - 2 1153 | 1154 | Error detected during the scanning, e.g., invalid character. 1155 | 1156 | - 3 1157 | 1158 | Parse error. 1159 | 1160 | - 4 1161 | 1162 | Identifier binding errors such as duplicate name definition, or undefined name use. 1163 | 1164 | - 5 1165 | 1166 | Type checking errors (such as type incompatibility). 1167 | 1168 | - 64 (`EX_USAGE`) 1169 | 1170 | The command was used incorrectly, e.g., with the wrong number of arguments, a bad flag, a bad syntax in a parameter, or whatever. This is the value used by `argp`. 1171 | 1172 | When several errors have occurred, the least value should be issued, not the earliest. For instance: 1173 | 1174 | ``` 1175 | (let error in end; %) 1176 | ``` 1177 | 1178 | should exit 2, not 3, although the parse error was first detected. 1179 | 1180 | 1181 | 1182 | In addition to compiler errors, the compiled programs may have to raise a runtime error, for instance when runtime functions received improper arguments. In that case use the exit code 120, and issue a clear diagnostic. Because of the basic MIPS model we target which does not provide the standard error output, the message is to be output onto the standard output. 1183 | 1184 | ------ 1185 | 1186 | 1187 | 1188 | Previous: [Errors](https://www.lrde.epita.fr/~tiger//tiger.html#Errors), Up: [Implementation](https://www.lrde.epita.fr/~tiger//tiger.html#Implementation) [[Contents](https://www.lrde.epita.fr/~tiger//tiger.html#SEC_Contents)] 1189 | 1190 | 1191 | 1192 | ### 4.3 Extensions 1193 | 1194 | A strictly compliant compiler must behave exactly as specified in this document and in Andrew Appel’s book, and as demonstrated by the samples exhibited in this document and in see [(assignments)Assignments](https://www.lrde.epita.fr/~tiger//assignments.html#Top). 1195 | 1196 | Nevertheless, you are entirely free to extend your compiler as you wish, as long as this extension is enabled by a non standard option. Extensions include: 1197 | 1198 | - ANSI Colors 1199 | 1200 | Do not do that by default, in particular without checking if the output `isatty`, as the correction program will not appreciate. 1201 | 1202 | - Language Extensions 1203 | 1204 | If for instance you intend to support loop-expression, the construct must be rejected (as a syntax error) if the corresponding option was not specified. 1205 | 1206 | In **any case**, if you don’t implement an extension that was suggested (such as --hir-use-ix, then **you must not accept the option**. If the compiler accepts an option, then the effect of this option will be checked. For instance, if your compiler accepts --hir-use-ix but does not implement it, then be sure to get 0 on these tests. 1207 | 1208 | ------ 1209 | 1210 | 1211 | 1212 | Previous: [Implementation](https://www.lrde.epita.fr/~tiger//tiger.html#Implementation), Up: [Top](https://www.lrde.epita.fr/~tiger//tiger.html#Top) [[Contents](https://www.lrde.epita.fr/~tiger//tiger.html#SEC_Contents)] 1213 | 1214 | 1215 | 1216 | ## 5 The Reference Implementation 1217 | 1218 | The so-called “reference compiler” is the compiler the LRDE develops to (i) prototype what students will have to implement, and to (ii) control the output from student compilers. It might be useful to some to see the name we gave to our options. The following is informative only, the exact contract for a conforming implementation of a Tiger compiler is defined above, [Implementation](https://www.lrde.epita.fr/~tiger//tiger.html#Implementation). 1219 | 1220 | ``` 1221 | $ tc --help 1222 | Tiger Compiler, Copyright (C) 2004-2018 LRDE.: 1223 | 1224 | 0. Tasks: 1225 | --task-list list registered tasks 1226 | --task-graph show task graph 1227 | --task-selection list tasks to be run 1228 | --time-report report execution times 1229 | 1230 | 1. Parsing: 1231 | --scan-trace trace the scanning 1232 | --parse-trace trace the parse 1233 | --prelude STRING name of the prelude. Defaults to 1234 | "builtin" denoting the builtin prelude 1235 | -X [ --no-prelude ] don't include prelude 1236 | --parse parse a file 1237 | --library-display display library search path 1238 | -P [ --library-append ] DIR append directory DIR to the search path 1239 | -p [ --library-prepend ] DIR prepend directory DIR to the search path 1240 | 1241 | 2. Abstract Syntax Tree: 1242 | -A [ --ast-display ] display the AST 1243 | --ast-dump dump the AST 1244 | --tikz-style enable TikZ-style output in AST dumping 1245 | 1246 | 2.5 Cloning: 1247 | --clone clone the Ast 1248 | 1249 | 3. Bind: 1250 | --bound default the computation of bindings to 1251 | Tiger (without objects nor overloading) 1252 | -b [ --bindings-compute ] bind the identifiers 1253 | -B [ --bindings-display ] enable bindings display in the AST 1254 | --rename rename identifiers to unique names 1255 | 1256 | 3. Callgraph: 1257 | --escapes-sl-compute compute the escaping static links and the 1258 | functions requiring a static link 1259 | --escapes-sl-display enable static links' escapes in the AST 1260 | --callgraph-compute build the call graph 1261 | --callgraph-dump dump the call graph 1262 | --parentgraph-compute build the parent graph 1263 | --parentgraph-dump dump the parent graph 1264 | 1265 | 3. Escapes: 1266 | -e [ --escapes-compute ] compute the escaping variables and the 1267 | functions requiring a static link 1268 | -E [ --escapes-display ] enable escape display in the AST 1269 | --escapes-check check that escape tags are correct 1270 | --escapes-necessary-check check that tagged variables are escaping 1271 | --escapes-sufficient-check check that escaping variables are tagged 1272 | --escapes-tags-display enable escape tags display in the AST 1273 | 1274 | 4. Type checking: 1275 | -T [ --typed ] default the type-checking to Tiger 1276 | (without objects nor overloading) 1277 | --types-compute check for type violations 1278 | 1279 | 4.5 Type checking with overloading: 1280 | --overfun-bindings-compute bind the identifiers, allowing function 1281 | overloading 1282 | -O [ --overfun-types-compute ] check for type violations, allowing 1283 | function overloading 1284 | 1285 | 5. Translation to High Level Intermediate Representation: 1286 | --hir-compute translate to HIR 1287 | -H [ --hir-display ] display the HIR 1288 | --hir-naive don't use "Ix" during the translation 1289 | 1290 | 5.5. Translation to LLVM Intermediate Representation: 1291 | --llvm-compute translate to LLVM IR 1292 | --llvm-runtime-display enable runtime displayingalong with the 1293 | LLVM IR 1294 | --llvm-display display the LLVM IR 1295 | 1296 | 6. Translation to Low Level Intermediate Representation: 1297 | --canon-compute canonicalize 1298 | --canon-trace trace the canonicalization of the LIR 1299 | -C [ --canon-display ] display the canonicalized IR 1300 | --traces-compute make traces 1301 | --traces-trace trace the traces computation 1302 | --lir-compute translate to LIR (alias for 1303 | --trace-compute) 1304 | -L [ --lir-display ] display the low level intermediate 1305 | representation 1306 | 1307 | 7. Target selection: 1308 | -i [ --inst-compute ] select the instructions 1309 | -R [ --runtime-display ] display the runtime 1310 | --inst-debug enable instructions verbose display 1311 | --rule-trace enable rule reducing display 1312 | --garbage-collection enable garbage collection 1313 | -I [ --inst-display ] display the instructions 1314 | -Y [ --nolimips-display ] display Nolimips compatible instructions 1315 | (i.e., allocate the frames and then 1316 | display the instructions 1317 | --targeted default the target to MIPS 1318 | --target-mips select MIPS as target 1319 | --target-ia32 select IA-32 as target 1320 | --target-arm select ARM as target 1321 | --target-display display the current target 1322 | --callee-save NUM max number of callee save registers 1323 | --caller-save NUM max number of caller save registers 1324 | --argument NUM max number of argument registers 1325 | 1326 | 8. Liveness: 1327 | -F [ --flowgraph-dump ] dump the flowgraphs 1328 | -V [ --liveness-dump ] dump the liveness graphs 1329 | -N [ --interference-dump ] dump the interference graphs 1330 | 1331 | 9. Register Allocation: 1332 | --asm-coalesce-disable disable coalescence 1333 | --asm-trace trace register allocation 1334 | -s [ --asm-compute ] allocate the registers 1335 | -S [ --asm-display ] display the final assembler 1336 | 1337 | Desugaring and bounds-checking: 1338 | --desugar-for desugar `for' loops 1339 | --desugar-string-cmp desugar string comparisons 1340 | --desugared Default the removal of syntactic sugar 1341 | from the AST to Tiger (without 1342 | overloading) 1343 | --desugar desugar the AST 1344 | --overfun-desugar desugar the AST, allowing function 1345 | overloading 1346 | --raw-desugar desugar the AST without recomputing 1347 | bindings nor types 1348 | --bounds-checks-add add dynamic bounds checks 1349 | --overfun-bounds-checks-add add dynamic bounds checks with support 1350 | for overloading 1351 | --raw-bounds-checks-add add bounds-checking to the AST without 1352 | recomputing bindings nor types 1353 | 1354 | Inlining: 1355 | --inline inline functions 1356 | --overfun-inline inline functions with support for 1357 | overloading 1358 | --prune prune unused functions 1359 | --overfun-prune prune unused functions with support for 1360 | overloading 1361 | 1362 | Object: 1363 | -o [ --object ] enable object extensions 1364 | --object-parse parse a file, allowing objects 1365 | --object-bindings-compute bind the identifiers, allowing objects 1366 | --object-types-compute check for type violations, allowing 1367 | objects 1368 | --object-rename rename identifiers to unique names, 1369 | allowing objects 1370 | --object-desugar remove object constructs from the program 1371 | --raw-object-desugar remove object constructs from the program 1372 | without recomputing bindings nor types 1373 | --overfun-object-bindings-compute bind the identifiers, allowing function 1374 | overloading with object 1375 | --overfun-object-types-compute check for type violations, allowing 1376 | function overloading with object 1377 | --overfun-object-rename rename identifiers to unique names, 1378 | allowing function overloading with 1379 | objects 1380 | --overfun-object-desugar remove object constructs from the 1381 | programallowing function overloading with 1382 | objects 1383 | 1384 | Temporaries: 1385 | --tempmap-display display the temporary table 1386 | 1387 | -? [ --help ] Give this help list 1388 | --usage Give a short usage message 1389 | --version Print program version 1390 | ``` 1391 | 1392 | 1393 | 1394 | **Example 5.1:** tc --help 1395 | 1396 | ![tiger.img/task-graph](https://www.lrde.epita.fr/~tiger//tiger.img/task-graph.png) 1397 | 1398 | - File: **Tasks dependency diagram** 1399 | 1400 | ------ 1401 | 1402 | #### Footnotes 1403 | 1404 | ### [(1)](https://www.lrde.epita.fr/~tiger//tiger.html#DOCF1) 1405 | 1406 | A super class can only be a *class* type, and not another kind of type. 1407 | 1408 | ### [(2)](https://www.lrde.epita.fr/~tiger//tiger.html#DOCF2) 1409 | 1410 | Which is not the case in C++, where methods have *covariant* return values. 1411 | 1412 | ------ --------------------------------------------------------------------------------