Formal grammars are sets of rules which describe which strings of words are considered grammatical. They can be used to characterize the syntax of both natural language and programming languages.
Fix a finite set called the vocabulary, also called the terminal symbols. A language is a subset of the free monoid , i.e. strings of words. Let denote the free semigroup, i.e. non-empty strings. and denote the disjoint union and the Cartesian product respectively.
An unrestricted grammar is given by a tuple where is a finite set of nonterminal symbols with a start symbol , and is a finite set of rewrite rules. The language generated by the grammar is usually defined as follows:
The construction of can be seen as the free preordered monoid generated by the rewrite rules . Grammatical derivations can be seen as string diagrams in the free monoidal category with the symbols as generating objects and the rewrite rules as generating arrows.
Context free grammars are defined in the same way as unrestricted grammars, but they only allow rules with a single non-terminal symbol as domain. In that particular case, the grammatical derivations are trees with the start symbol as root and the words as leaves.
In categorial grammars, monoidal categories are replaced by biclosed monoidal category. The derivations are trees with two kinds of rules: 1) a finite language-dependent dictionary of rules with and , for the free residuated monoid (see Lambek (1958)) and 2) an infinite number of language-agnostic rules, for the application of each higher order type.
Similarly, pregroup grammars are constructed from a dictionary with types coming from a free pregroup . The only rewrite rules are the counit maps and , canceling a non-terminal symbol with its left and right adjoints and . The derivations are planar string diagrams in a rigid monoidal category, the counit maps are drawn as cup-shaped wires connecting the words in a sentence.
The grammatical derivations of dependency grammars can also be encoded as diagrams in a rigid monoidal category.
The origins of grammar go back to 4th century BC India: the Sanskrit philologist Pānini showed ancient mastery of context-sensitive grammars to analyse the syntax of Sanskrit.
Unrestricted grammars were introduced by Noam Chomsky as the level zero of his hierarchy, they are equivalent to Turing machines in expressive power.
The similar concept of semi-Thue system, also known as a string rewriting system, was introduced thirty years before Chomsky’s work on grammar, see Thue (1914). It makes no distinction between terminal and non-terminal symbols, and allows rules with empty left-hand sides.
A third, equivalent, formulation was proved to be undecidable independently by Emil Post? and Andrey Markov: the word problem for semigroups.
Last revised on July 2, 2022 at 22:55:02. See the history of this page for a list of all contributions to it.