In this chapter we investigate methods for parsing and recognition in contextfree grammars (CFGs). Both problems have significant practical applications. Parsing, for example, is an essential feature of a compiler, which translates from one computer language (the “source”) to another (the “target”). Typically, the source is a high-level language, while the target is machine language.
The first compilers were built in the early 1950s. Computing pioneer Grace Murray Hopper built one at Remington Rand during 1951–1952. At that time, constructing a compiler was a black art that was very time consuming. When John Backus led the project that produced a FORTRAN compiler in 1955–1957, it took 18 person-years to complete.
Today, modern parser generators, such as Yacc (which stands for “yet another compiler-compiler”) and Bison, allow a single person to construct a compiler in a few hours or days. These tools are based on LALR(1) parsing, a variant of one of the parsing methods we will discuss here. Parsing is also a feature of natural language recognition systems.
In Section 5.1 we will see how to accomplish parsing in an arbitrary CFG in polynomial time. More precisely, if the grammar G is in Chomsky normal form, we can parse an arbitrary string w ∈ L(G) of length n in O(n3) time. While a running time of O(n3) is often considered tractable in computer science, as programs get bigger and bigger, it becomes more and more essential that parsing be performed in linear time.