Book contents
- Frontmatter
- Contents
- Preface
- Part I Fundamentals of Compilation
- 1 Introduction
- 2 Lexical Analysis
- 3 Parsing
- 4 Abstract Syntax
- 5 Semantic Analysis
- 6 Activation Records
- 7 Translation to Intermediate Code
- 8 Basic Blocks and Traces
- 9 Instruction Selection
- 10 Liveness Analysis
- 11 Register Allocation
- 12 Putting It All Together
- Part II Advanced Topics
- Appendix: Tiger Language Reference Manual
- Bibliography
- Index
2 - Lexical Analysis
Published online by Cambridge University Press: 05 June 2012
- Frontmatter
- Contents
- Preface
- Part I Fundamentals of Compilation
- 1 Introduction
- 2 Lexical Analysis
- 3 Parsing
- 4 Abstract Syntax
- 5 Semantic Analysis
- 6 Activation Records
- 7 Translation to Intermediate Code
- 8 Basic Blocks and Traces
- 9 Instruction Selection
- 10 Liveness Analysis
- 11 Register Allocation
- 12 Putting It All Together
- Part II Advanced Topics
- Appendix: Tiger Language Reference Manual
- Bibliography
- Index
Summary
lex-i-cal: of or relating to words or the vocabulary of a language as distinguished from its grammar and construction
Webster's DictionaryTo translate a program from one language into another, a compiler must first pull it apart and understand its structure and meaning, then put it together in a different way. The front end of the compiler performs analysis; the back end does synthesis.
The analysis is usually broken up into
Lexical analysis: breaking the input into individual words or “tokens”;
Syntax analysis: parsing the phrase structure of the program; and
Semantic analysis: calculating the program's meaning.
The lexical analyzer takes a stream of characters and produces a stream of names, keywords, and punctuation marks; it discards white space and comments between the tokens. It would unduly complicate the parser to have to account for possible white space and comments at every possible point; this is the main reason for separating lexical analysis from parsing.
Lexical analysis is not very complicated, but we will attack it with high-powered formalisms and tools, because similar formalisms will be useful in the study of parsing and similar tools have many applications in areas other than compilation.
LEXICAL TOKENS
A lexical token is a sequence of characters that can be treated as a unit in the grammar of a programming language. A programming language classifies lexical tokens into a finite set of token types.
- Type
- Chapter
- Information
- Modern Compiler Implementation in C , pp. 16 - 38Publisher: Cambridge University PressPrint publication year: 1997