Initiality Project - Raw Syntax

Initiality Project - Raw Syntax

Initiality Project - Raw Syntax

This page is part of the Initiality Project.


Here we define the “raw syntax” for our type theory. These are sets whose elements are called “raw terms” and “raw types”. They are obtained by putting together operations like “Σ\Sigma-types” and “λ\lambda-abstraction” in a way that “makes sense locally” but not necessarily globally.

In traditional approaches to logic, one often defines the raw terms (or “well-formed formulas”) as a subset of the set of all “strings”, i.e. finite lists of elements of some “alphabet” set such as {Σ,Π,λ,(,),:,}\{\Sigma,\Pi,\lambda,(,),:,\dots \}. Note that parentheses, colons, commas, and so on are all symbols in the alphabet. In addition, this point of view requires us to alter some of the clauses to include extra parentheses, e.g. we have to write (A×B)(A\times B) so that ((A×B)×C)((A\times B)\times C) has an unambiguous meaning.

However, a more modern perspective is to regard this as defining an inductive type (in the metatheory), with each clause as a constructor. In this case, the raw terms themselves are not lists of symbols with some property, but rather trees whose nodes are labeled with appropriate symbols. This way we obtain automatically an induction principle for defining functions by recursion and proving theorems by induction over the set of terms, which in the strings-of-symbols approach one has to prove by hand. The strings of symbols we write on the page like (A×B)×C(A\times B)\times C are then regarded as just a notation for these “abstract syntax trees”.

It’s true that to be completely precise about the entire passage from “mathematics written on the page” to its meaning, one would want to also define the strings of symbols and prove that they faithfully represent the syntax trees. This is analogous to how a compiler for a programming language must start out by “parsing” and “lexing” the code written by the programmer (which is a string of symbols) into an internal “abstract syntax tree” representation. However, such things are well-understood and extremely uninteresting, so much so that programmers often use “parser generator” programs to write parsers for them; the interesting and nontrivial mathematics starts once we have an abstract syntax tree.

Variables and meta-variables

Experts will know that there are many ways to deal with variable binding in type theory, including “named variables”, “de Bruijn indices”, “de Bruijn levels”, “locally nameless variables”, and so on. We will use “named variables”. The reason is that our goal is expositional and sociological, and named variables keep the syntax as close as possible to what “users” of type theory are familiar with (e.g. with named variables we write λx.λy.App(x,y)\lambda x. \lambda y. App(x,y), whereas with de Bruijn indices one has to write instead λ.λ.App(2,1)\lambda. \lambda. App(2,1)). Since we are humans writing for humans to read, we don’t want to deal explicitly with de Bruijn indices, and neither do we want to pretend that we are using de Bruijn indices when we aren’t really using them.

Thus, we suppose given an infinite set VarVar, whose elements we call variables. Constructivists should assume that VarVar has decidable equality. In addition we assume given an operation fr:Var×P fin(Var)Varfr : Var \times P_{fin}(Var) \to Var which replaces a variable with another one that is “fresh for”, i.e. distinct from all of, a given finite set of variables. That is, we assume that fr(x,V)Vfr(x,V) \notin V. We also assume for simplicity that if already xVx\notin V then fr(x,V)=xfr(x,V) = x. We will also write fr(x,V)fr(x,V) as x [V]x^{[V]}, and if X={x 1,,x n}X = \{x_1,\dots,x_n\} is a linearly ordered set of variables we will write

X [V]={x 1 [V],x 2 [V{x 1 [V]}],,x n [V{x 1,,x n1} [V]]}X^{[V]} = \{ x_1^{[V]}, x_2^{[V\cup \{x_1^{[V]}\}]}, \dots, x_n^{[V \cup \{x_1,\dots,x_{n-1}\}^{[V]}]}\}

for the result of freshening each element of XX in succession, ensuring that they remain distinct not only from VV but also from each other.

We will also frequently use meta-variables. These are simply ordinary mathematical variables in the “metatheory”, i.e. the ordinary informal mathematics we are using in which to define and prove things about type theory. Single upper-case roman letters will usually be meta-variables ranging over raw terms or raw types. Single lower-case roman letters will usually be meta-variables ranging over variables (i.e. elements of VarVar).

Sorts, operators, and constants

Overall, we are not attempting in this project to treat “all type theories” at any level of generality, instead simply dealing with individual type formers one by one (though attempting to be as “modular” as possible). However, at the level of raw syntax we can easily be fully general, thereby saving ourselves some additional work in each case. Thus, we will work with a variant of what in Practical Foundations for Programming Languages are called abstract binding trees, parametrized by a set of sorts and a set of operators.

For us the set of sorts is Sort={tm,ty}Sort = \{ tm, ty \}. That is, there are two classes of raw syntactic objects: raw terms and raw types. The only function of the notion of “sort” is to avoid writing exactly the same things twice in the two cases.

The set of operators OpOp is a parameter of the theory; the rules for each type former begin by specifying some relevant operators. For instance, the operators associated to Π\Pi-types are {Π,λ,App}\{ \Pi, \lambda, App\}.

Each operator is furthermore associated with a signature or generalized arity, which consists of:

  1. A sort (i.e. either tmtm or tyty).
  2. A natural number nn, the argument arity.
  3. A function from [n]={1,,n}[n]=\{1,\dots,n\} to SortSort, the argument sorts.
  4. A natural number mm, the binding arity.
  5. A decidable relation \lhd between [n][n] and [m][m] called scoping. If jkj\lhd k we say that the j thj^{th} argument is in the scope of the k thk^{th} binder.

Often (e.g. in PFPL) the scoping relation is a function from [m][m] to [n][n], i.e. each binder has exactly one subterm in its scope. However, it is convenient to be more general; e.g. in a fully-annotated λ\lambda-abstraction λ(x:A.B).M\lambda(x:A.B).M it is natural to consider BB and MM to be both in the scope of the variable xx, rather than requiring this to be desugared to a form such as λ(A,x.B,x.M)\lambda(A,x.B,x'.M) in which potentially-different variables are bound in BB and MM. Thus, we will represent λ\lambda as an operator with 3 arguments, two of sort tyty and one of sort tmtm, with binding arity 1, and scoping relations 212\lhd 1 and 313\lhd 1.

Fully general abstract binding trees also allow each binder to be associated to a sort. However, in our type theories all variables will range only over terms (not types).

Finally, in addition to the operators arising from type formers, we assume a collection of constants, each of which is an operator with binding arity 00 and all arguments having sort tmtm.

We will usually use single upper-case typewriter font letters like C,D\mathtt{C},\mathtt{D} for meta-variables ranging over operators, including constants.

Raw terms

Our raw syntax, though untyped, will be strongly scoped. By this we mean that each raw term or type is indexed by a finite set of variables that may occur in it freely. To be precise, we define inductively a family of sets RawRaw indexed by P fin(Var)×SortP_{fin}(Var) \times Sort; we call their elements raw terms/types with variables from VV. Since we are in such generality, there are only two constructors of this inductive family:

  • For any VP fin(Var)V\in P_{fin}(Var) and any xVx\in V, we have xRaw(V,tm)x\in Raw(V,tm).

  • Suppose C\mathtt{C} is an operator of sort ss, argument arity nn and binding arity mm. Suppose also V,XP fin(Var)V, X\in P_{fin}(Var), with VX=V\cap X = \emptyset, |X|=m{|X|} = m, and X={x 1,,x m}X = \{x_1,\dots, x_m\} totally ordered. Given j[n]j\in [n], write X j={x kjk}X_j = \{x_k \mid j \lhd k \} for the variables bound in argument jj, and suppose furthermore that for each j[n]j\in [n] we have M jRaw(VX j,t j)M_j\in Raw(V\cup X_j, t_j), where t:[n]Sortt:[n] \to Sort gives the argument sorts of C\mathtt{C}. Then we have an element C(X;M)Raw(V,s)\mathtt{C}(X;\vec{M}) \in Raw(V,s).

This can be read as saying that each set Raw(V,s)Raw(V,s) consists of well-founded rooted trees containing nodes of two kinds:

  • Nodes labeled by a variable (element of VarVar) that is either in VV or is bound in some parent of that node and scopes over that node. Such a node has no children and is of sort tmtm.

  • Nodes labeled by an operator and a list of variables of its binding arity that are not in VV or bound in some parent of that node. Such a node has a number of children equal to the argument arity of its operator label, of appropriate sorts, and each of its variables scopes over the subtrees of those children specified by the scoping relation of its operator label, while its sort is that of its operator label.

The above clauses also introduce notation for these trees: we will write simply “xx” for the tree consisting of a single variable node with associated variable xx, and C(X;M)\mathtt{C}(X;\vec{M}) for the tree whose root is an operator node with associated operator C\mathtt{C}, bound variables XX, and subtrees M\vec{M}. (These are the “linear” syntaxes that have to be “parsed” into an abstract syntax tree if implementing type theory on a computer.) For instance, a fully annotated abstraction λ(x:A.B).M\lambda(x:A.B).M is desugared as λ(x;A,B,M)\lambda(x;A,B,M), a node labeled by λ\lambda and xx with three subtrees, two of which are in the scope of xx.

The requirement that VX=V\cap X= \emptyset amounts to a “local Barendregt convention?” that the names of bound variables can never clash with (or “shadow”) any free variables, where the set of “free variables” is declared for each raw term or type (the set VV) rather than extracted by syntactic analysis. Maintaining this convention requires “freshening” when we weaken by adding new unused free variables; see below.


For each type former, we describe its binding arity, argument arity and sorts, and scoping relation in a hopefully more readable form, by giving arbitrary names to its bound variables and arguments. We also give the “sugared” or “traditional” syntax that we will usually use for it.

For instance, the λ\lambda row below means that λ\lambda is an operator of sort tmtm, it has two arguments of sort tyty and one of sort tmtm, and one binder that scopes over the second type argument and the term argument.


operatorsortvarstype argsterm argsscopingsugared syntax
Π\PitytyxxAA, BBBxB\lhd xΠ(x:A).B\Pi(x:A).B
λ\lambdatmtmxxAA, BBMMBxB\lhd x, MxM\lhd xλ(x:A.B).M\lambda(x:A.B).M
AppApptmtmxxAA, BBMM, NNBxB\lhd xApp (x:A).B(M,N)App^{(x:A).B}(M,N)


to be added later…

Structural induction

The principle of structural induction on raw terms and types says that to prove something about all of them, it suffices to consider raw terms and types constructed according to each of the above clauses (i.e. consider all possible labels for the root of an abstract syntax tree), assuming as an inductive hypothesis that the desired statement holds of all subtrees. This is a basic property of an inductive type; if our metatheory is set-theoretic then it can be proven from the definition of well-founded tree.

Similarly, there is a structural recursion principle for defining pairs of functions with domains RawTmRawTm and RawTyRawTy: to define such a function it suffices to say, for each inductive clause, how to construct the value of the function on a tree with that root, assuming given its values on all the subtrees.


We want to identify raw terms and types that differ only according to a renaming of bound variables. However, in defining this inductively, we end up having to also rename free variables in the terms where the variables are bound. We do this all together by defining a relation M ρMM \sim_\rho M', where MRaw(V,s)M\in Raw(V,s), MRaw(V,s)M'\in Raw(V',s), and ρ\rho is a bijection VVV\cong V', by recursion on MM and MM' as follows:

  • If xVx\in V and yVy\in V', then x ρyx \sim_\rho y if and only if ρ(x)=y\rho(x)=y.

  • For ordered sets X,XX,X' of the same cardinality, let ξ:XX\xi:X\cong X' be the unique order-preserving bijection between them, and with X j={x kjk}X_j = \{x_k \mid j \lhd k \} and X j={x kjk}X'_j = \{x'_k \mid j \lhd k \} as before we have similarly ξ j:X jX j\xi_j : X_j \cong X'_j. Then C(X;M) ρC(X;M)\mathtt{C}(X;\vec{M}) \sim_{\rho} \mathtt{C}(X';\vec{M'}) if and only if M j ρξ jM jM_j \sim_{\rho \cup \xi_j} M'_j for all jj.

  • In all other cases (e.g. one of MM and MM' is a variable and the other is an operator, or they are distinct operators), M ρMM\sim_\rho M' is false.

It is easy to prove by induction that:

  • For any MRaw(V,s)M\in Raw(V,s) we have M idMM\sim_{id} M.
  • If M ρMM\sim_\rho M' then M ρ 1MM' \sim_{\rho^{-1}} M.
  • If M ρMM\sim_\rho M' and M ρMM' \sim_{\rho'} M'', then M ρρMM\sim_{\rho'\circ\rho} M''.

In particular, each id\sim_{id} is an equivalence relation on Raw(V,s)Raw(V,s).

Weakening, exchange, and contraction

For any function σ:VW\sigma :V\to W, we define a function Raw(V,s)Raw(W,s)Raw(V,s) \to Raw(W,s), written MM[σ]M \mapsto M[\sigma], by induction on MM. When σ\sigma is a subset inclusion, this is “weakening”: adding a new unused free variable. But to define weakening inductively, we will need to involve M[σ]M[\sigma] when σ\sigma is an injection that is not a subset inclusion, and it is no extra trouble to define it for any function σ\sigma at all.

  • If M=xVM=x\in V, we define x[σ]=σ(x)x[\sigma] = \sigma(x).

  • If M=C(X;M)M=\mathtt{C}(X;\vec{M}), we would like to define M[σ]M[\sigma] to be “C(X;N)\mathtt{C}(X;\vec{N})” where N j=M j[σ]N_j = M_j[\sigma]. But this is not allowed by our definitions, since XX may not be disjoint from WW (though it is disjoint from VV). Thus, recall that we have the successive freshening X [W]X^{[W]} that is disjoint from WW and an order-preserving bijection ξ:XX [W]\xi :X \cong X^{[W]}. Let ξ j:X jX j [W]\xi_j:X_j \cong X^{[W]}_j be the restriction of ξ\xi to the variables bound in argument jj; then recursively we have M j[σξ j]Raw(WX j [W],s j)M_j[\sigma\cup \xi_j] \in Raw(W\cup X^{[W]}_j,s_j). If we write N j=M j[σξ j]N_j = M_j[\sigma\cup \xi_j], then we can define C(X;M)[σ]=C(X [W];N)\mathtt{C}(X;\vec{M})[\sigma] = \mathtt{C}(X^{[W]};\vec{N}).

It is easy to prove by induction that if ρ,ρ\rho,\rho' are bijections and ρσ=σρ\rho' \circ \sigma = \sigma'\circ \rho, then M ρMM\sim_\rho M' implies M[σ] ρM[σ]M[\sigma] \sim_{\rho'} M'[\sigma']. That is, the structural rules respect α\alpha-equivalence.


Let NRaw(V,tm)N\in Raw(V,tm) and MRaw(V{x},s)M\in Raw(V\cup \{x\},s) where xVx\notin V; we want to define M[N/x]Raw(V,s)M[N/x] \in Raw(V,s) by induction on MM.

  • If M=xM=x, then x[N/x]=Nx[N/x] = N; this is well-typed since in this case s=tms=tm.

  • If M=yM=y for some variable yVy\in V, then y[N/x]=yy[N/x] = y.

  • If M=C(X;M)M = \mathtt{C}(X;\vec{M}), we want to recursively substitute NN for xx in each M jM_j. But recall that M jRaw(V{x}X j,s j)M_j \in Raw(V\cup \{x\} \cup X_j,s_j), so in order to write “M j[N j/x]M_j[N_j/x]” we need N jRaw(VX j,tm)N_j \in Raw(V\cup X_j,tm), whereas we are given only NRaw(V,tm)N\in Raw(V,tm). Thus, we define N j=N[σ j]N_j = N[\sigma_j], where σ j:VVX j\sigma_j:V\to V\cup X_j is the obvious inclusion. Then we can define P j=M j[N[σ j]/x]P_j = M_j[N[\sigma_j]/x] and hence M[N/x]=C(X;P)M[N/x] = \mathtt{C}(X;\vec{P}).

Note that substitution is automatically “capture-avoiding” because of our “strong scoping” and local Barendregt convention. Before we can write M[N/x]M[N/x], strong scoping requires MM and NN to have the same free variables (except for xx, which only MM has), and no bound variables in MM can coincide with these free variables; thus no free variables in NN can get “captured” when it is substituted under binders in MM. If MM and NN are not “given” with the same free variables, we need to explicitly weaken them first in order to substitute, and the definition of weakening was forced by strong scoping to rename all bound variables to keep them distinct from the newly added free ones; in particular, the bound variables in MM must get renamed to differ from any free variables in NN. Finally, the definition of substitution is also forced by strong scoping to rename bound variables in NN (using the weakening [σ j][\sigma_j]) to keep them distinct from any bound variables in MM under which they are substituted, ensuring that M[N/x]M[N/x] still satisfies the Barendregt convention.

Last revised on December 5, 2018 at 16:54:36. See the history of this page for a list of all contributions to it.