Grammatical inference has been studied since the inception of the theory of formal grammars in the 1960s, in particular to provide a formal framework for language acquisition. Since the pioneering paper of Gold (1967), which introduced the concept of identification in the limit, numerous works have been carried out in several scientific communities, including those studying machine learning, pattern recognition, natural language processing, formal language theory, and electronic circuit design. Their goal was to set a theoretical framework for grammatical inference and to design practical learning methods. Recently, these techniques have been used in several application domains such as genomics, natural language processing, and the testing of computer programs.
This chapter provides the necessary fundamental concepts to understand the flavor of this field and reports on a study of generic learning algorithms with regard to the covering test and generalization.
Experimental evidence points again to a phase transition phenomenon, albeit different from those already encountered in this book.
The task of inferring grammars
While so far we have discussed a learning scenario where the task is to extract regularities from sets or collections of (labeled) descriptions, this scenario is far from covering all learning situations. Indeed, much data comes in sequences, and often what one wants to learn is the trend, tendency, or even the rule governing the sequences. Thus, the important relationship is the sequential or temporal organization of the data.