nLab regular language

Redirected from "regular languages".

Contents

Idea

Regular languages are a simple kind of formal languages, the least expressive in the Chomsky hierarchy.

Definition

Fix a finite set VV (serving as the alphabet) and denote its free monoid by V V^\star. A language LV L \subseteq V^\star is regular whenever there is a formal grammar G=(V,X,R,s)G = (V, X, R, s) that generates it (i.e. L(G)=LL(G) = L) such that all rules in RR are of the form xwxx \to w x' for a terminal wVw \in V and non-terminal symbols x,xXx, x' \in X.

Properties

Regular languages can be characterised by regular expressions. Regular expressions are inductively generated by the following rules:

  1. For each terminal wVw \in V, ww is a regular expression;
  2. There are basic expressions \emptyset, ϵ\epsilon;
  3. For each pair of regular expressions R 1,R 2R_1,R_2, there are expressions R 1|R 2R_1|R_2 and R 1R 2R_1 \cdot R_2;
  4. For each regular expression RR, the Kleene star R R^\star is a regular expression.

These can be given a semantics in the poset of languages P(V )P(V^\star) by universal constructions in 2\mathbf{2}-enriched category theory, where 2=(01)\mathbf{2} = (0 \leq 1) is regarded as cartesian closed, and V V^\star as a discrete poset is regarded as a 2\mathbf{2}-enriched category.

  1. L()=L(\emptyset) = \emptyset and L(R 1|R 2)=L(R 1)L(R 2)L(R_1 | R_2) = L(R_1) \cup L(R_2). These are nullary and binary joins.
  2. L(w)={w}L(w) = \{ w \}. If we view languages LV L \subseteq V^\star as 2\mathbf{2}-enriched presheaves V 2V^\star \to \mathbf{2}, then these L(w)L(w) are among the representable presheaves.
  3. L(ϵ)L(\epsilon) consists of only the empty string, L(R 1R 2)={αV |βL(R 1),γL(R 2)(α=βγ)}L(R_1\cdot R_2) = \{\alpha \in V^\star \vert \exists \beta \in L(R_1), \gamma \in L(R_2) \; (\alpha = \beta \gamma) \}. The right side is an application of a 2\mathbf{2}-enriched version of the Day convolution monoidal structure, with monoidal product
    :P(V )×P(V )P(V )\otimes: P(V^\star) \times P(V^\star) \to P(V^\star)

    induced from the concatenation product V ×V V V^\star \times V^\star \to V^\star that makes V V^\star a discrete monoidal poset.

  4. L(R )={αV |nβ 0,,β n1L(R)(α=β 0β n1)}L(R^\star) = \{ \alpha \in V^\star \vert \exists n \in \mathbb{N}\; \exists\beta_0,\ldots,\beta_{n-1} \in L(R)\; (\alpha = \beta_0\cdots\beta_{n-1}) \} is the smallest submonoid of V V^\star that contains L(R)L(R). In category theoretic terms this is the least solution to the recurrence equation L(R )=L(ϵ|(RR ))L(R^\star) = L(\epsilon| (R\cdot R^\star)), and can be constructed by Kleene's fixed point theorem as an ω\omega-colimit.

Every regular language is denoted by some regular expression in this way.

Equivalently, a language is regular if it can be recognised by a finite-state automaton.

References

Last revised on July 1, 2025 at 15:04:44. See the history of this page for a list of all contributions to it.