Todd Trimble Notes on predicate logic

Contents

Introduction

Formalized mathematics is presented in terms of axiomatic theories and valid deductions therein. The best developed, most precise vehicle for handling axiomatic theories is predicate or first-order logic, as explicated by Frege more than a century ago. It is the baseline language for exact reasoning in mathematics, and universally accepted currency for exchanging proofs between mathematicians.

Logic comes in various degrees of expressive and deductive strength, but in its most common form, classical predicate logic means

This article aims to describe finitary classical first-order logic with equality from a conceptual point of view. Many textbooks, e.g., Hodges’ A Shorter Model Theory, tend to pass through this material in a hurry, perhaps under the (basically correct) assumption that mathematicians already know how to reason correctly. However, the mathematical structure of predicate logic is rich in categorical significance and has many pleasant offshoots, and shouldn’t be relegated to a hurried recitation of the basic vocabulary of terms, symbols, well-formed formulas and so on.

For those who like condensed summaries, much of what we describe below is to be a detailed description of the free Boolean hyperdoctrine generated from a signature. We also outline Gentzen sequent calculus as a formal structuring of deductions.

Are logical foundations circular?

There is a frequently voiced discomfort about logical foundations: that it calls upon the mathematics that it is supposed to be prior to. One is dealing with such mathematical items as the set of words or lists over an alphabet, sets of well-formed expressions, trees which exhibit the structure of a deduction, and so on. Thus, logical foundations appears to assume the prior existence of certain infinite sets, whose manipulation is the business of a set theory, or at the very least a complete formalization of logic would appear to involve some substrate of Peano arithmetic. But wouldn’t a background metatheory of sets or arithmetic then presuppose a logic to handle it correctly? Isn’t logical foundations hopelessly circular?

In spirit, this is like what might be called the “paradox of the dictionary”. A dictionary, real or ideal, defines words in terms of other words. So either a dictionary is hopelessly circular, or some words must be left undefined (permitting some inexactness to creep in). Similarly, it would seem that the structure of logic itself is circular, or needs “undefined” terms, or … perhaps calls on an infinite regress of metatheories?

Our own view is that logical foundations avoids this paradox ultimately by being relentlessly concrete. We may put it this way: logic at the primary level consists of instructions for dealing with formal linguistic items, but the concrete actions for executing those instructions (electrons moving through logic gates, a person unconsciously making an inference in response to a situation) are not themselves linguistic items, not of the language. They are nevertheless as precise and exact as one could wish.

We emphasize this point because in our descriptions below, we obviously must use language to describe logic, and some of this language indeed looks just like the formal mathematics that logic is supposed to be prior to. Nevertheless, the apparent circularity should be considered spurious: it is assumed that the programmer who reads an instruction such as “concatenate a list of lists into a master list” does not need full-blown mathematical language to formalize this, but will be able to translate this directly into actions performed in the real world. However, at the same time, the mathematically literate reader may like having a mathematical meta-layer in which to understand the instructions. The presence of this meta-level should not be a source of confusion, leading one to think we are pulling a circular “fast one”.

Overall, the most useful metaphor might be a computer that can recognize and construct valid deductions in a finitary set theory, such as ETCS. This doesn’t presuppose some infinite sets that somehow reside in the computer. The computer only deals with finite fragments of a theory at a time, using hardware and software of determinate size. The ultimate manifestation of logic is therefore machine-level and extra-linguistic (where the “machine” is a computer or a human who acquires the skill of exact reasoning). Our aim here is not to describe this ultimate concrete manifestation, but rather to present a conceptual overview for mathematically sophisticated readers.

The structure of predicate logic

Predicate logic comes in various layers, each layer given in terms of basic data or “generators”, and basic rules or operations for constructing more complicated expressions. There are four layers of data and constructions we deal with, each layer depending on the previous:

The stuff of logic per se is in these rules of construction. An axiomatic theory per se is given by a collection of basic data that generates it. In other words, a theory consists of:

This follows a familiar generators-and-relations pattern: the signature itself can be viewed as generating a “free theory”, and then one mods out by the filter of deductions that can be derived from the axioms, which is like modding out by relations.

Sometimes theories are further classified. For example, a functional theory is one whose signature includes no basic (non-equality) predicates. An equational theory is a functional theory whose only axioms are of the form B\vdash B where BB is an equality predicate between terms. There are Horn theories, geometric theories, and so on, depending on the structure of formulas given in the axioms.

We now discuss each of these layers in turn. We suppose in what follows that the data of a theory (sorts, function symbols, relation symbols, and axioms) has been given.

Types

A basic type is one of a (for us, finite) collection of items s 1,,s ns_1, \ldots, s_n called sorts. The basic rule of type construction is that given two types T,TT, T', we may construct a new type T×TT \times T', and there is an empty type 11. (Thus, we are dealing only with product types.)

The best way to consider types is that they are the objects of the free category with finite cartesian products generated from a discrete category of objects s 1,,s ns_1, \ldots, s_n. This may be constructed quite syntactically (and fussily), but however it is constructed, it is equivalent to the category

(Fin/S) op(Fin/S)^{op}

where FinFin is the category of finite sets, and SS is the finite collection of sorts. For our purposes, it is convenient to take the objects of FinFin to be finite cardinals {1,,n}\{1, \ldots, n\}, so that an object of Fin/SFin/S is a function [n]={1,,n}S[n] = \{1, \ldots, n\} \to S, or an element of the free monoid S *S^\ast. Thus, a product type TT is identified with a word in SS. The category (Fin/S) op(Fin/S)^{op} may be regarded as a monoid in the bicategory whose 1-cells are spans from S *S^\ast to S *S^\ast, and we sometimes commit an abuse of language, writing

S *domProd(S)codS *S^\ast \stackrel{dom}{\leftarrow} Prod(S) \stackrel{cod}{\to} S^\ast

for the underlying span (the apex is of course actually the set of morphisms of (Fin/S) op(Fin/S)^{op}), and at other times writing Prod(S)Prod(S) for the category (Fin/S) op(Fin/S)^{op}.

A morphism ϕ:TT\phi: T \to T', where T:[m]ST: [m] \to S and T:[n]ST': [n] \to S are words in SS, is described by a function h:[n][m]h: [n] \to [m] making the diagram

[n] h [m] T T S \array{ [n] & \stackrel{h}{\to} & [m] \\ T' \searrow & & \swarrow T \\ & S & }

commute. Alternatively, a morphism is effectively an SS-indexed collection of functions between finite sets.

Two words are isomorphic if they are equal when regarded as commutative words, and in this sense we informally (and harmlessly) write sSs n(s)\prod_{s \in S} s^{n(s)} for any such product type together with its product structure (the projection maps to individual sorts). From this point of view, a morphism between product types is a product of diagonal maps

i[m]g(m) mMg(m) h 1(m)= nNf(n)\prod_{i \in [m]} g(m) \to \prod_{m \in M} g(m)^{h^{-1}(m)} = \prod_{n \in N} f(n)

where for each single-sort factor s=g(m)s = g(m) in the domain, there is a diagonal map s !:ss h 1(m)s^!: s \to s^{h^{-1}(m)}. (If h 1(m)h^{-1}(m) is empty, this diagonal map is actually a projection map s1s \to 1.) In this way, each morphism in Prod(S)Prod(S) is a “generalized diagonal map”. Such generalized diagonal maps may be considered as operations between types which come “for free”, by virtue of the cartesian structure of product types. These operations are basically what is needed to construct and manipulate equality predicates, as we will discuss below.

However such generalized diagonal maps (i.e., SS-indexed collections of finite functions) may be registered in a machine, a good modern notation for visualizing generalized diagonal maps is in terms of string diagrams.

Terms/Operations

We now describe the terms generated by function symbols. Each function symbol has an arity (T,s)(T, s) where TS *T \in S^\ast is a type and sSs \in S is a sort; we write f:Tsf: T \to s to indicate the arity. (The semantics in terms of sets is that an “element” of a product type TT is a tuple of multiple inputs of ff, and an “element” of ss is a single output.) In other words, the collection of function symbols forms an SS-multigraph:

Definition

Let SS be a set of sorts, and S *S^\ast the free monoid on SS. A multigraph over SS is a set FF together with a span S *d 0Fd 1SS^\ast \stackrel{d_0}{\leftarrow} F \stackrel{d_1}{\to} S.

If the domain d 0(f)d_0(f) of a function symbol is the identity eS *e \in S^\ast (the empty word), then we call ff a constant.

Roughly speaking, the terms generated from a multigraph of function symbols are built up by formally plugging in outputs of function symbols as inputs of another, and iterating. Allowances are made so that some of the inputs used might be equal, or might not be used at all. We describe this more precisely in two ways, first from the more or less traditional approach of term syntax, and then from a more conceptual categorical point of view.

Term syntax

The usual syntax (which we will only sketch) is via terms τ\tau, each of a specified sort ss which we indicate by writing τ:s\tau: s. Intuitively, terms of type ss are considered “elements of type ss”. To get started, one supposes given for each sort ss an infinite stock of “variables” of that sort, and which are to be used as input placeholders in terms. Then general terms are built by syntactically replacing variables with other terms, which themselves have their own variables.

In the traditional approach, terms (relative to a multigraph of function symbols over a set SS of sorts) are introduced in a recursive manner:

The last rule includes the case where the domain of ff is empty. Thus, if cc is a constant function symbol with codomain ss, there is a corresponding term c:sc: s, called a constant term of type ss.

So a typical term τ\tau might look like this:

f(g(x,y),h(u,v,u),c):sf(g(x, y), h(u, v, u), c): s

where x,y,u,vx, y, u, v denote variables of appropriate sorts and cc is a constant. In particular, a variable uu might appear more than once in a term τ\tau. In the older logical approaches, one would read off the set of variables of τ\tau as the set of variables that appear “inside”, in this example x,y,u,vx, y, u, v. In more expressive typed theories, which allow for “function-space” types, one moreover distinguishes between free variables and bound variables inside terms, and one would analogously read off the set of free variables inside a term. Even though we won’t consider theories with function-types here, we will refer to “free” variables even in the “products-only” case.

In more recent approaches to term syntax, one considers that terms might have free variables which aren’t explicitly given within the body of the term. (Intuitively, variables which don’t appear can be considered to have been “projected out”, by means of precomposing a function with a projection map like (x 1,,x n1,x n)(x 1,,x n1)(x_1, \ldots, x_{n-1}, x_n) \mapsto (x_1, \ldots, x_{n-1}).)

An approach that works for “products-only” typed theories and for theories with function-types alike proceeds by giving “terms-in-context”. A term-in-context is an expression

x 1:s 1,,x n:s nτ:sx_1: s_1, \ldots, x_n: s_n \vdash \tau: s

where the context, appearing before the turnstyle, lists distinct variable terms and their types that are considered to be “free variables” of τ\tau. The context in effect prescribes a domain for the term: a term-in-context has an arity s 1,,s nss_1,\ldots, s_n \vdash s; in a set-theoretic interpretation, a term-in-context as above gets interpreted as a function of the form

|s 1|××|s n||s|.|s_1| \times \ldots \times |s_n| \to |s|.

Each variable appearing in τ\tau appears within the context, but there may be extra variables which don’t appear in τ\tau (ones that are “projected out”). Note that the context is a listing of the free variables in a specific order, so that

x:s,x:sf(x,x)x: s, x': s' \vdash f(x, x')

is considered different to x:s,x:sf(x,x)x': s', x: s \vdash f(x, x') (but each is considered as valid as the other).

Basic to the enterprise is the notion of substituting terms (in context) for free variables. The idea is that given a term

x 1:s 1,,x n:s nτ:sx_1: s_1, \ldots, x_n: s_n \vdash \tau: s

and given terms whose contexts are abbreviated by capital Greek letters, Γ it i:s i\Gamma_i \vdash t_i: s_i for i=1,,ni = 1, \ldots, n, one may define by recursion a term

Γ 1,,Γ nτ[x 1/t 1,,x n/t n]:s\Gamma_1, \ldots, \Gamma_n \vdash \tau[x_1/t_1, \ldots, x_n/t_n]: s

where the idea is that one substitutes the term t it_i for each instance of x ix_i, and then concatenates the contexts of the t it_i into a single context.

Finally there is a global rule of so-called “α\alpha-conversion”, by which we identify a term Γ,x:s,Δτ:s\Gamma, x:s, \Delta \vdash \tau: s' with another term obtained by substituting a fresh variable in for xx. (In more formal terms, by performing a substitution [x/t][x/t] where the term tt (in context) is x:sx:sx': s \vdash x': s, provided that x:sx': s does not already occur in Γ\Gamma or Δ\Delta.) This is reasonable, since it is intuitively clear that the semantics of both terms are exactly the same.

In summary, “normal terms” are α\alpha-equivalence classes of terms in contexts as sketched above.

This concludes our informal description of the term syntax. It should be clear that although the rules of term formation, substitutions, etc. are not hard to understand, the fully formal set-up is a bit fussy, and somewhat heavy on syntax.

“Operadic” presentation

There are a number of ways of explaining the same material from a more categorical point of view. Whatever the explanation is, the point is to describe the free category with finite products generated from a multigraph FF over SS. We denote this free category with products as Term(S,F)Term(S, F).

As before, the objects of Term(S,F)Term(S, F) are product types: elements of S *S^\ast. Morphisms from a type T:[m]ST: [m] \to S to a type T:[n]ST': [n] \to S are, in the language of the preceding section, nn-tuples of normal terms (t 1,,t n)(t_1, \ldots, t_n) where each t it_i has arity TT(i)T \to T'(i). Composition is effected by term substitution. The cartesian product structure is essentially given by concatenating (juxtaposing) lists of sorts and terms.

It is hopefully more or less clear how all of this works in the term syntax approach. Despite this, we believe that the bureaucracy of handling variables in the term syntax is something of a hack; from one point of view (closely related to string diagrams), some of it is actually unnecessary.

For example, the input strings play the role of variables declared in the context, but the difference is that they do not need “variable names” – they only need to be labeled by an appropriate sort, for type-checking purposes. This trick effectively eliminates the need for rules of α\alpha-conversion. As we will see, one can also effect a neat division of labor between the business of variable declarations and the business of “pure substitutions”; moreover, this division clarifies the precise entry point of the particular doctrine we are working in (the doctrine of finite product categories).

To begin, recall the following abstract definition:

Definition

Let MM be a cartesian monad acting on a finitely complete category CC. An MM-span in CC from cc to dd is a span of the form

McfxgdM c \stackrel{f}{\leftarrow} x \stackrel{g}{\to} d

MM-spans are composed by consideration of a pullback

xy My x Mh Mk Mf g Mb m b MMb Mc d\array{ & & & & & & x \circ y & & & & \\ & & & & & \swarrow & & \searrow & & & \\ & & & & M y & & & & x & & \\ & & & ^\mathllap{M h} \swarrow & & \searrow ^\mathrlap{M k} & & ^\mathllap{M f} \swarrow & & \searrow ^\mathrlap{g} & \\ M b & \stackrel{m_b}{\leftarrow} & M M b & & & & M c & & & & d }

where m:MMMm: M M \to M is the multiplication on the monad MM. Under this composition, the MM-spans are 1-cells of a bicategory MM-SpanSpan.

In the case where MM is the free monoid monad acting on SetSet, an MM-span from a set SS to SS is the same as a multigraph over SS. A monad on SS in the bicategory MM-SpanSpan is a multicategory over SS. We are especially interested in the free multicategory generated from a multigraph over SS.

The free multicategory construction has other names and descriptions. We could also call it the free nonpermutative SS-sorted operad generated by a set of SS-typed function symbols. The apex of its underlying span, together with its map to SS, can also be referred to as the initial algebra for the polynomial endofunctor PP on Set/SSet/S which takes an SS-indexed set X sX_s to

P(X) s= TS * f:Ts sSX s T sP(X)_s = \sum_{T \in S^\ast} \sum_{f: T \to s} \prod_{s' \in S} X_{s'}^{T_{s'}}

(fFf \in F). However it is named, the underlying multigraph of the free multicategory generated by a multigraph FF can be described as

Tree(F) in out S * S\array{ & & Tree(F) & & \\ & ^\mathllap{in} \swarrow & & \searrow ^\mathrlap{out} & \\ S^\ast & & & & S }

where Tree(F)Tree(F) is the set of FF-labeled planar trees. This means that

The list of labels of “leaves” of an FF-labeled tree tt (edges that are not outgoing edges of any node) gives an element in(t)S *in(t) \in S^\ast, and the label of the “root” edge gives an element out(t)Sout(t) \in S. Notice that FF-labeled trees have obvious string diagram representations.

Next, any multicategory generates a (strict monoidal) category. In the present instance, we denote this category as Pro(S,F)Pro(S, F) (we write “Pro” by analogy with “prop” – whereas props are used for the doctrine of symmetric monoidal categories, pros are used for monoidal categories). The objects of Pro(S,F)Pro(S, F) are elements of S *S^\ast. The morphisms of Pro(S,F)Pro(S, F) could be described as “FF-labeled forests”. Formally, the underlying span of Pro(S,F)Pro(S, F) is

S *m SS **in *Tree(F) *out *S *S^\ast \stackrel{m_S}{\leftarrow} S^{\ast\ast} \stackrel{in^\ast}{\leftarrow} Tree(F)^\ast \stackrel{out^\ast}{\to} S^\ast

The monoidal product on forests is simply juxtaposition. The composition of forests is the obvious one, plugging in roots of one forest for leaves of another.

The construction Pro(S,F)Pro(S, F) formalizes what we meant earlier by “pure substitutions of terms”, and lives in the doctrine of (strict) monoidal categories. To switch over to the doctrine of categories with finite cartesian products, we apply a simple trick. Let Pro(S,F)Prod(S)Pro(S, F) \circ Prod(S) be the composite of the two spans

Prod(S) Pro(S,F) dom cod in out S * S * S *\array{ & & Prod(S) & & & & Pro(S, F) & & \\ & ^\mathllap{dom} \swarrow & & \searrow ^\mathrlap{cod} & & ^\mathllap{in} \swarrow & & \searrow ^\mathrlap{out} & \\ S^\ast & & & & S^\ast & & & & S^\ast }

Then, in the first place, Pro(S,F)Prod(S)Pro(S, F) \circ Prod(S) carries a canonical category structure. The reason is that, viewing Prod(S)Prod(S) and Pro(S,F)Pro(S, F) as monads in the bicategory of spans, there is a canonical distributive law

θ:Prod(S)Pro(S,F)Pro(S,F)Prod(S)\theta: Prod(S) \circ Pro(S, F) \to Pro(S, F) \circ Prod(S)

(here Pro(S,F)Pro(S, F) can be replaced by any pro over SS). The idea is that the distributive law θ\theta maps an element of Prod(S)Pro(S,F)Prod(S) \circ Pro(S, F), which is a pair of arrows

TfTdTT \stackrel{f}{\to} T' \stackrel{d}{\to} T''

with ff an arrow of Pro(S,F)Pro(S, F) and dd an arrow of Prod(S)Prod(S), to a pair of arrows in Pro(S,F)Prod(S)Pro(S, F) \circ Prod(S) whose precise form is dictated by a naturality requirement in dd. For example, if d=δ T:TTTd = \delta_T': T' \to T' \otimes T', then

θ(f,δ T)=(Tδ TTTffTT)\theta(f, \delta_T') = (T \stackrel{\delta_T}{\to} T \otimes T \stackrel{f \otimes f}{\to} T' \otimes T')

Using this distributive law, the composite of the monads Prod(S)Prod(S) and Pro(S,F)Pro(S, F) is another monad in the bicategory of spans, and therefore a category with set of objects S *S^\ast.

The same trick works for other doctrines over the doctrine of monoidal categories. For example, if Perm(S)Perm(S) is the free symmetric (strict) monoidal category generated by SS, regarded as a span from S *S^\ast to S *S^\ast, then there is a distributive law Perm(S)Pro(S,F)Pro(S,F)Perm(S)Perm(S) \circ Pro(S, F) \to Pro(S, F) \circ Perm(S). Similarly with “symmetric monoidal” replaced with “braided monoidal”.

Definition

Term(S,F)Pro(S,F)Prod(S)Term(S, F) \coloneqq Pro(S, F) \circ Prod(S), regarded as a cartesian category with product structure inherited from Prod(S)Prod(S), meaning that

Prod(S)id S *Prod(S)u1Pro(S,F)Prod(S)Prod(S) \cong id_{S^\ast} \circ Prod(S) \stackrel{u \circ 1}{\to} Pro(S, F) \circ Prod(S)

is a product-preserving functor, where uu is the unit of the monad Pro(S,F)Pro(S, F).

More explicitly, the tensor product \bigotimes on Term(S,F)Term(S, F) is a cartesian product provided that there are natural transformations δ:idΔ\delta: id \to \otimes \Delta, ε:ide!\varepsilon: id \to e ! which endow each object with a cocommutative comonoid structure. But the naturality follows from the definition of the distributive law, and the cocommutative comonoid axioms already hold in Prod(S)Prod(S).

Note that this abstract description of Term(S,F)Term(S, F) is identical to that given by the syntax of normal terms. The “bureaucracy of variables” is here organized into two departments, Prod(S)Prod(S) and Pro(S,F)Pro(S, F), each having its own individual categorical structure, which interact via the distributive law. (This is the “division of labor” we were talking about, where each arrow of Term(S,F)Term(S, F) is factorized into a generalized diagonal map followed by an FF-labeled forest.)

Another name for Term(S,F)Term(S, F) is “term algebra”, and yet another name for it is “the free SS-sorted Lawvere theory generated by a set of SS-sorted operation symbols FF”.

Theorem

Term(S,F)Term(S, F) is the free category with products generated by the multigraph FF over SS.

Proof

Tree(S,F)Tree(S, F) is the free multicategory generated by the multigraph FF over SS, and Pro(S,F)Pro(S, F) is the free monoidal category generated by the multicategory Tree(S,F)Tree(S, F). Therefore Pro(S,F)Pro(S, F) is the free monoidal category generated by the multigraph FF. It remains to show that the monoidal inclusion

Pro(S,F)Pro(S,F)id S *1uPro(S,F)Prod(S)Pro(S, F) \cong Pro(S, F) \circ id_{S^\ast} \stackrel{1 \circ u}{\to} Pro(S, F) \circ Prod(S)

is universal among monoidal functors X:Pro(S,F)CX: Pro(S, F) \to C to cartesian monoidal categories CC.

There is of course a product-preserving functor Prod(S)CProd(S) \to C compatible with the restriction S *Pro(S,F)XCS^\ast \hookrightarrow Pro(S, F) \stackrel{X}{\to} C. At the level of spans, this gives a composable pair of span morphisms

S * dom Prod(S) cod S * in Pro(S,F) out S * C 0 d 0 C 1 d 1 C 0 d 0 C 1 d 1 C 0\array{ S^\ast & \stackrel{\dom}{\leftarrow} & Prod(S) & \stackrel{\cod}{\to} & S^\ast & \stackrel{in}{\leftarrow} & Pro(S, F) & \stackrel{out}{\to} & S^\ast \\ \downarrow & & \downarrow & & \downarrow & & \downarrow & & \downarrow \\ C_0 & \underset{d_0}{\leftarrow} & C_1 & \underset{d_1}{\to} & C_0 & \underset{d_0}{\leftarrow} & C_1 & \underset{d_1}{\to} & C_0 }

which we then compose with the span morphism m:C 1C 1C 1m: C_1 \circ C_1 \to C_1 given by composition in CC:

Pro(S,F)Prod(S)C 1C 1mC 1Pro(S, F) \circ Prod(S) \to C_1 \circ C_1 \stackrel{m}{\to} C_1

This gives a morphism between underlying spans, Term(S,F)CTerm(S, F) \to C. This is functorial (i.e., is a morphism of monads in SpanSpan) because both Prod(S)CProd(S) \to C and Pro(S,F)CPro(S, F) \to C are functorial, and also because the compositional equalities enforced by the distributive law θ\theta are preserved: they are taken to equalities expressed by naturality of diagonal maps and projection maps in CC. The functor Term(S,F)CTerm(S, F) \to C is product-preserving, because Prod(S)CProd(S) \to C is product-preserving. The uniqueness of the product-preserving extension Term(S,F)CTerm(S, F) \to C is clear since the subcategories Prod(S)Prod(S) and Pro(S,F)Pro(S, F) together generate Term(S,F)Term(S, F).

Pullbacks in Term(S,F)Term(S, F)

This section contains some technical material which will be important for the upcoming discussion of the Beck-Chevalley condition. The point is that the Beck-Chevalley condition is a very powerful principle, and should be applied only to the “right” sorts of pullbacks. The pullbacks we are interested in here can be roughly described as those which are “based on products”; more precisely, as pullbacks that are absolute with respect to the doctrine of finite-product categories, in the sense of the following definition.

Definition

Let CC be a category with finite products, and let SS be a pullback square in CC. We say that SS is a (semantically) productive pullback if, for every category with finite products DD and every finite-product preserving functor F:CDF: C \to D, the square F(S)F(S) is a pullback square in DD.

It suffices to test this condition for the case D=SetD = Set:

Lemma

For a pullback square SS in CC to be productive, it is necessary and sufficient that every product-preserving functor F:CSetF: C \to Set take SS to a pullback in SetSet.

Proof

Since the Yoneda embedding y:DSet D opy: D \to Set^{D^{op}} preserves and reflects pullbacks, and since limits in Set D opSet^{D^{op}} are computed objectwise, the representable functors hom(d,)=ev dy:DSet\hom(d, -) = ev_d \circ y: D \to Set jointly preserve and reflect pullbacks in DD.

One way of thinking about the last example is that, externally speaking, we can see that s:s: \mathbb{N} \to \mathbb{N} is cancellable (monic) in the term category, but the term category cannot “see” this fact “internally” in the doctrine of finite product categories, otherwise the monicity would be preserved under any interpretation (i.e., in any model XX).

There is a more verbose (but straightforward) syntactic notion of productive pullback, i.e., a pullback which comes “for free” in any category with finite products. They include the following:

A id,f A×B f (1) f×id B δ B×BA×B f×id A×B id×g (2) id×g A×B f×id A×B\array{ A & \stackrel{\langle id, f \rangle}{\to} & A \times B \\ ^\mathllap{f} \downarrow & (1) & \downarrow ^\mathrlap{f \times id} \\ B & \underset{\delta}{\to} & B \times B } \qquad \qquad \qquad \qquad \array{ A \times B & \stackrel{f \times id}{\to} & A' \times B \\ ^\mathllap{id \times g} \downarrow & (2) & \downarrow ^\mathrlap{id \times g} \\ A \times B' & \underset{f \times id}{\to} & A' \times B' }
\,
A id A id (3) δ A δ A×AA×B f×g A×B σ (4) σ B×A g×f B×A\array{ A & \stackrel{id}{\to} & A \\ ^\mathllap{id} \downarrow & (3) & \downarrow ^\mathrlap{\delta} \\ A & \underset{\delta}{\to} & A \times A } \qquad \qquad \qquad \qquad \array{ A \times B & \stackrel{f \times g}{\to} & A' \times B' \\ ^\mathllap{\sigma} \downarrow & (4) & \downarrow ^\mathrlap{\sigma} \\ B \times A & \underset{g \times f}{\to} & B' \times A' }
Definition

(Compare Seely, section 3.) Let CC be a category with finite products. A pullback square in CC is syntactically productive if it belongs to the smallest class Π\Pi of squares containing squares of form (1)-(4) above and closed under the following operations:

  • If

    A f A g g B h BA f A g g B h B\array{ A & \stackrel{f}{\to} & A' \\ ^\mathllap{g} \downarrow & & \downarrow ^\mathrlap{g'} \\ B & \underset{h}{\to} & B' } \qquad \array{ A' & \stackrel{f'}{\to} & A'' \\ ^\mathllap{g'} \downarrow & & \downarrow ^\mathrlap{g''} \\ B' & \underset{h'}{\to} & B'' }

    belong to Π\Pi, then so does their composite;

  • If

    A f A g g B h B\array{ A & \stackrel{f}{\to} & A' \\ ^\mathllap{g} \downarrow & & \downarrow ^\mathrlap{g'} \\ B & \underset{h}{\to} & B' }

    belongs to Π\Pi, then so does the square obtained by applying a functor of the form C×C \times - or ×C- \times C to this square.

It is clear by induction that if F:CDF: C \to D is a finite-product preserving functor and Σ\Sigma is a syntactically productive pullback in CC, then F(Σ)F(\Sigma) is a pullback square in DD. Thus, a syntactically productive square is semantically productive. This may be regarded as a “soundness theorem”.

We aim to prove a “completeness theorem” as well: that every semantically productive pullback square in Term(S,F)Term(S, F) is syntactically productive.

Lemma

Let FMor(Term(S,F))F \hookrightarrow Mor(Term(S, F)) be the canonical inclusion. Then each morphism in the image of this inclusion is monic in Term(S,F)Term(S, F).

Proof

Suppose fh 1=fh 2f h_1 = f h_2 for two morphisms h 1,h 2h_1, h_2 of Term(S,F)Term(S, F). Each morphism h 1h_1 of Term(S,F)Term(S, F) has a unique factorization as g 1d 1g_1 d_1, where g 1g_1 belongs to Pro(S,F)Pro(S, F) and d 1d_1 to Prod(S)Prod(S); similarly we uniquely factorize h 2h_2 as g 2d 2g_2 d_2. From fg 1d 1=fg 2d 2f g_1 d_1 = f g_2 d_2, we derive fg 1=fg 2f g_1 = f g_2 and d 1=d 2d_1 = d_2, again by the unique factorization. From fg 1=fg 2f g_1 = f g_2 we derive g 1=g 2g_1 = g_2, according to the recursive definition of FF-labeled forest. Therefore g 1d 1=g 2d 2g_1 d_1 = g_2 d_2, as desired.

Corollary

Let i:Pro(S,F)Term(S,F)i: Pro(S, F) \to Term(S, F) be the canonical inclusion. Then each morphism in the image of ii is monic in Term(S,F)Term(S, F).

Proof

This follows easily by induction. Each morphism of the form i(ϕ)i(\phi) is a cartesian product bigtimes k=1 ni(ϕ k)\bigtimes_{k=1}^n i(\phi_k) of FF-labeled trees, and each non-trivial tree ϕ k\phi_k is of the form i(f kψ k)i(f_k \circ \psi_k) where f kf_k is a function symbol and ψ k\psi_k is an FF-labeled forest. Then i(f k)i(f_k) is monic by the preceding lemma, and i(ψ k)i(\psi_k) is monic by induction. Thus i(f kψ k)i(f_k \psi_k) is monic, and since a cartesian product of monos is monic, this completes the proof.

Predicates

Axioms and deductions

References

Revised on July 20, 2013 at 17:30:21 by Todd Trimble