Search

2 - Monoidal Finite-State Automata
from Part I - Formal Background
Stoyan Mihov, Klaus U. Schulz, Ludwig-Maximilians-Universität Munchen
Book:

Finite-State Techniques

Published online:

29 July 2019

Print publication:

01 August 2019, pp 23-42
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

As a starting point we study finite-state automata, which represent the simplest devices for recognizing languages. The theory of finite-state automata has been described in numerous textbooks both from a computational and an algebraic point of view. Here we immediately look at the more general concept of a monoidal finite-state automaton, and the focus of this chapter is general constructions and results for finite-state automata over arbitrary monoids and monoidal languages. Refined pictures for the special (and more standard) cases where we only consider free monoids or Cartesian products of monoids will be given later.

1 - Formal Preliminaries
from Part I - Formal Background
Stoyan Mihov, Klaus U. Schulz, Ludwig-Maximilians-Universität Munchen
Book:

Finite-State Techniques

Published online:

29 July 2019

Print publication:

01 August 2019, pp 3-22
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

The aim of this chapter is twofold. First, we recall a collection of basic mathematical notions that are needed for the discussions of the following chapters. Second, we have a first, still purely mathematical, look at the central topics of the book: languages, relations and functions between strings, as well as important operations on languages, relations and functions. We also introduce monoids, a class of algebraic structures that gives an abstract view on strings, languages, and relations.

Part II - From Theory to Practice
Stoyan Mihov, Klaus U. Schulz, Ludwig-Maximilians-Universität Munchen
Book:

Finite-State Techniques

Published online:

29 July 2019

Print publication:

01 August 2019, pp 159-160
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

3 - Classical Finite-State Automata and Regular Languages
from Part I - Formal Background
Stoyan Mihov, Klaus U. Schulz, Ludwig-Maximilians-Universität Munchen
Book:

Finite-State Techniques

Published online:

29 July 2019

Print publication:

01 August 2019, pp 43-71
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Classical finite-state automata represent the most important class of monoidal finite-state automata. Since the underlying monoid is free, this class of automaton has several interesting specific features. We show that each classical finite-state automaton can be converted to an equivalent classical finite-state automaton where the transition relation is a function. This form of ‘deterministic’ automaton offers a very efficient recognition mechanism since each input word is consumed on at most one path. The fact that each classical finite-state automaton can be converted to a deterministic automaton can be used to show that the class of languages that can be recognized by a classical finite-state automaton is closed under intersections, complements, and set differences. The characterization of regular languages and deterministic finite-state automata in terms of the ‘Myhill–Nerode equivalence relation’ to be introduced in the chapter offers an algebraic view on these notions and leads to the concept of minimal deterministic automata.

10 - The Minimal Deterministic Finite-State Automaton for a Finite Language
from Part II - From Theory to Practice
Stoyan Mihov, Klaus U. Schulz, Ludwig-Maximilians-Universität Munchen
Book:

Finite-State Techniques

Published online:

29 July 2019

Print publication:

01 August 2019, pp 253-278
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

A fundamental task in natural language processing is the efficient representation of lexica. From a computational viewpoint, lexica need to be represented in a way directly supporting fast access to entries, and minimizing space requirements. A standard method is to represent lexica as minimal deterministic (classical) finite-state automata. To reach such a representation it is of course possible to first build the trie of the lexicon and then to minimize this automaton afterwards. However, in general the intermediate trie is much larger than the resulting minimal automaton. Hence a much better strategy is to use a specialized algorithm to directly compute the minimal deterministic automaton in an incremental way. In this chapter we describe such a procedure.

Contents
Stoyan Mihov, Klaus U. Schulz, Ludwig-Maximilians-Universität Munchen
Book:

Finite-State Techniques

Published online:

29 July 2019

Print publication:

01 August 2019, pp v-viii
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Preface
Stoyan Mihov, Klaus U. Schulz, Ludwig-Maximilians-Universität Munchen
Book:

Finite-State Techniques

Published online:

29 July 2019

Print publication:

01 August 2019, pp ix-x
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

9 - The Aho–Corasick Algorithm
from Part II - From Theory to Practice
Stoyan Mihov, Klaus U. Schulz, Ludwig-Maximilians-Universität Munchen
Book:

Finite-State Techniques

Published online:

29 July 2019

Print publication:

01 August 2019, pp 236-252
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter describes a special construction based on finite-state automata with important applications: the Aho–Corasick algorithm is used to efficiently find all occurrences of a finite set of strings (also called pattern set, or dictionary) in a given input string, called the ‘text’. Search is ‘online’, which means that the input text is neither fixed nor preprocessed in any way. This problem is a special instance of pattern matching in strings, and other automata constructions are used to solve other pattern matching tasks. From an automaton point of view, the Aho–Corasick algorithm comes in two variants. We first present the more efficient version where a classical deterministic finite-state automaton is built for text search. The disadvantage of this first construction is that the resulting automaton can become very large, in particular for large pattern alphabets. Afterwards we present the second version, where an automaton with additional transitions of a particular kind is built, yielding a much smaller device for text search.

Index
Stoyan Mihov, Klaus U. Schulz, Ludwig-Maximilians-Universität Munchen
Book:

Finite-State Techniques

Published online:

29 July 2019

Print publication:

01 August 2019, pp 302-304
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

11 - Constructing Finite-State Devices for Text Rewriting
from Part II - From Theory to Practice
Stoyan Mihov, Klaus U. Schulz, Ludwig-Maximilians-Universität Munchen
Book:

Finite-State Techniques

Published online:

29 July 2019

Print publication:

01 August 2019, pp 279-297
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

A common task arising in many contexts is rewriting parts of a given input string to another form. Subparts of the input that match specific conditions are replaced by other output parts. In this way, the complete input string is translated to a new output form. Due to the importance of text rewriting, many programming languages offer matching/rewriting operations for subexpressions of strings, also called replace rules. When using strictly regular relations and functions for representing replace rules, a cascade of replace rules can be composed into a single transducer. If the transducer is functional, an equivalent bimachine or (in some cases) a subsequential transducer can be built, thus achieving theoretically and practically optimal text processing speed. In this chapter we introduce basic constructions for building text rewriting transducers and bimachines from replace rules and provide implementations. A first simple version in general leads to an ambiguous form of text rewriting with several outputs. A second more sophisticated construction solves conflicts using the leftmost-longest match strategy and leads to functional devices.

4 - Monoidal Multi-Tape Automata and Finite-State Transducers
from Part I - Formal Background
Stoyan Mihov, Klaus U. Schulz, Ludwig-Maximilians-Universität Munchen
Book:

Finite-State Techniques

Published online:

29 July 2019

Print publication:

01 August 2019, pp 72-93
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

An important generalization of classical finite-state automata are multi-tape automata, which are used for recognizing relations of a particular type. The so-called regular relations (also refered to as ‘rational relations’) offer a natural way to formalize all kinds of translations and transformations, which makes multi-tape automata interesting for many practical applications and explains the general interest in this kind of device. A natural subclass are monoidal finite-state transducers, which can be defined as two-tape automata where the first tape reads strings. In this chapter we present the most important properties of monoidal multi-tape automata in general and monoidal finite-state transducers in particular. We show that the class of relations recognized by n-tape automata is closed under a number of useful relational operations like composition, Cartesian product, projection, inverse etc. We further present a procedure for deciding the functionality of classical finite-state transducers.

Frontmatter
Stoyan Mihov, Klaus U. Schulz, Ludwig-Maximilians-Universität Munchen
Book:

Finite-State Techniques

Published online:

29 July 2019

Print publication:

01 August 2019, pp i-iv
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Part I - Formal Background
Stoyan Mihov, Klaus U. Schulz, Ludwig-Maximilians-Universität Munchen
Book:

Finite-State Techniques

Published online:

29 July 2019

Print publication:

01 August 2019, pp 1-2
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

References
Stoyan Mihov, Klaus U. Schulz, Ludwig-Maximilians-Universität Munchen
Book:

Finite-State Techniques

Published online:

29 July 2019

Print publication:

01 August 2019, pp 298-301
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

7 - The C(M) language
from Part II - From Theory to Practice
Stoyan Mihov, Klaus U. Schulz, Ludwig-Maximilians-Universität Munchen
Book:

Finite-State Techniques

Published online:

29 July 2019

Print publication:

01 August 2019, pp 161-176
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In this chapter we introduce the C(M) language, a new programming language. C(M) statements and expressions closely resemble the notation commonly used for the presentation of formal constructions in a Tarskian style set theoretical language. The usual set theoretic objects such as sets, functions, relations, tuples etc. are naturally integrated in the language. In contrast to imperative languages such as C or Java, C(M) is a functional declarative programming language. C(M) has many similarities with Haskell but makes use of the standard mathematical notation like SETL. The C(M) compiler translates a well-formed C(M) program into efficient C code, which can be executed after compilation. Since it is easy to read C(M) programs, a pseudo-code description becomes obsolete.

6 - Bimachines
from Part I - Formal Background
Stoyan Mihov, Klaus U. Schulz, Ludwig-Maximilians-Universität Munchen
Book:

Finite-State Techniques

Published online:

29 July 2019

Print publication:

01 August 2019, pp 138-158
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In this chapter we introduce the bimachine, a deterministic finite-state device that exactly represents the class of all regular string functions. We prove this correspondence, using as a key ingredient a procedure for converting transducers to bimachines. Methods for pseudo-minimization and direct composition of bimachines are added.

8 - C(M) Implementation of Finite-State Devices
from Part II - From Theory to Practice
Stoyan Mihov, Klaus U. Schulz, Ludwig-Maximilians-Universität Munchen
Book:

Finite-State Techniques

Published online:

29 July 2019

Print publication:

01 August 2019, pp 177-235
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In this chapter we present C(M) implementations of the main automata constructions. Our aim is to provide full descriptions of the implementations that are clear and easy to follow. In some cases the simplicity of the implementation is achieved at the expense of some inefficiency.

5 - Deterministic Transducers
from Part I - Formal Background
Stoyan Mihov, Klaus U. Schulz, Ludwig-Maximilians-Universität Munchen
Book:

Finite-State Techniques

Published online:

29 July 2019

Print publication:

01 August 2019, pp 94-137
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In this chapter we explore deterministic finite-state transducers. Obviously, it only makes sense to ask for determinism if we restrict attention to transducers with a functional input-output behaviour. In this chapter we focus on transducers that are deterministic on the input tape (called sequential or subsquential transducers). We shall see that only a proper subset of all regular string functions can be represented by this kind of device and we describe a decision procedure for testing whether a functional transducer can be determinized. Further we present a subsequential transducer minimization procedure based on theMyhill–Nerode relation for string functions.

Finite-State Techniques

Automata, Transducers and Bimachines
Stoyan Mihov, Klaus U. Schulz
Published online:

29 July 2019

Print publication:

01 August 2019
- Book
- - Get access
    
    Buy a print copy
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Finite-state methods are the most efficient mechanisms for analysing textual and symbolic data, providing elegant solutions for an immense number of practical problems in computational linguistics and computer science. This book for graduate students and researchers gives a complete coverage of the field, starting from a conceptual introduction and building to advanced topics and applications. The central finite-state technologies are introduced with mathematical rigour, ranging from simple finite-state automata to transducers and bimachines as 'input-output' devices. Special attention is given to the rich possibilities of simplifying, transforming and combining finite-state devices. All algorithms presented are accompanied by full correctness proofs and executable source code in a new programming language, C(M), which focuses on transparency of steps and simplicity of code. Thus, by enabling readers to obtain a deep formal understanding of the subject and to put finite-state methods to real use, this book closes the gap between theory and practice.

Search Results

Refine search

Refine search

Actions for selected content:

19 results

2 - Monoidal Finite-State Automata

Summary

1 - Formal Preliminaries

Summary

Part II - From Theory to Practice

3 - Classical Finite-State Automata and Regular Languages

Summary

10 - The Minimal Deterministic Finite-State Automaton for a Finite Language

Summary

Contents

Preface

9 - The Aho–Corasick Algorithm

Summary

Index

11 - Constructing Finite-State Devices for Text Rewriting

Summary

4 - Monoidal Multi-Tape Automata and Finite-State Transducers

Summary

Frontmatter

Part I - Formal Background

References

7 - The C(M) language

Summary

6 - Bimachines

Summary

8 - C(M) Implementation of Finite-State Devices

Summary

5 - Deterministic Transducers

Summary

Finite-State Techniques

Search Results

Refine search

Refine search

Actions for selected content:

Save Search

19 results

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Finite-State Techniques