Book contents
- Frontmatter
- Contents
- Preface
- Part I Fundamentals of Compilation
- 1 Introduction
- 2 Lexical Analysis
- 3 Parsing
- 4 Abstract Syntax
- 5 Semantic Analysis
- 6 Activation Records
- 7 Translation to Intermediate Code
- 8 Basic Blocks and Traces
- 9 Instruction Selection
- 10 Liveness Analysis
- 11 Register Allocation
- 12 Putting It All Together
- Part II Advanced Topics
- Appendix: Tiger Language Reference Manual
- Bibliography
- Index
2 - Lexical Analysis
Published online by Cambridge University Press: 05 June 2012
- Frontmatter
- Contents
- Preface
- Part I Fundamentals of Compilation
- 1 Introduction
- 2 Lexical Analysis
- 3 Parsing
- 4 Abstract Syntax
- 5 Semantic Analysis
- 6 Activation Records
- 7 Translation to Intermediate Code
- 8 Basic Blocks and Traces
- 9 Instruction Selection
- 10 Liveness Analysis
- 11 Register Allocation
- 12 Putting It All Together
- Part II Advanced Topics
- Appendix: Tiger Language Reference Manual
- Bibliography
- Index
Summary
lex-i-cal: of or relating to words or the vocabulary of a language as distinguished from its grammar and construction
Webster's DictionaryTo translate a program from one language into another, a compiler must first pull it apart and understand its structure and meaning, then put it together in a different way. The front end of the compiler performs analysis; the back end does synthesis.
The analysis is usually broken up into
Lexical analysis: breaking the input into individual words or “tokens”;
Syntax analysis: parsing the phrase structure of the program; and
Semantic analysis: calculating the program's meaning.
The lexical analyzer takes a stream of characters and produces a stream of names, keywords, and punctuation marks; it discards white space and comments between the tokens. It would unduly complicate the parser to have to account for possible white space and comments at every possible point; this is the main reason for separating lexical analysis from parsing.
Lexical analysis is not very complicated, but we will attack it with high powered formalisms and tools, because similar formalisms will be useful in the study of parsing and similar tools have many applications in areas other than compilation.
LEXICAL TOKENS
A lexical token is a sequence of characters that can be treated as a unit in the grammar of a programming language. A programming language classifies lexical tokens into a finite set of token types.
- Type
- Chapter
- Information
- Modern Compiler Implementation in ML , pp. 14 - 37Publisher: Cambridge University PressPrint publication year: 1997