David Roberts set theory from ETCS (Trimble)

Contents

This is Parts I-III of an exposition by Todd Trimble? on ETCS?.


Contents

ZFC and ETCS: Elementary Theory of the Category of Sets

This is a post on “foundations? of mathematics” (eek!). I was motivated to write it while I’ve been struggling to understand better certain applications of ultrafilters – namely the theory of measurable cardinals – from a point of view and language that I feel comfortable with. My original intent was to blog about that, as a kind of off-shoot of the general discussion of ultrafilters I started in connection with the series on Stone duality, and because it seems kind of cool. And I will. But this got finished first, and I thought that it would be of interest to some who have been following my category theory posts.

A lot of confusion seems to reign around “the categorical? approach to foundations?” and what it might entail; some seem to think it involves a “doing-away with elements” that we all know and love, or doing away with sets? and supplanting them with categories?, or something like that. That’s an unfortunate misunderstanding. My own attitude is pragmatic: I’m all in favor of mathematicians using ordinary “naive” (pre-axiomatic) set theory to express their thoughts if that’s the familiar and appropriate conveyance – I mean, obviously I do it myself. It’s our common heritage, learned through years of undergraduate and graduate school experience and beyond. I’m not proposing for a moment to “overthrow” it.

What I do propose to discuss is a formalized set theory which embodies this rich tradition, but which takes advantage of categorical insights honed over the decades, and which I would argue is ‘natural’ in its ease to accept formulas in naive set theory and give them a foundation true to mathematical practice; I also argue it addresses certain criticisms which I feel could be put to that hallowed foundational theory, ZFC. I should own up that this theory is not immune to criticism, a main one being that a certain amount of preface and commentary is required to make it accessible (and I don’t think category theorists have done a particularly hot job doing that, frankly).

Let’s start by putting down what we want in very simple, pragmatic terms:

  • A (formalized) ‘set theory’ worthy of the name ought to realize a conception of sets as “completed collections”, and allow for the existence of enough sets and relations to fulfill the needs of working mathematicians.

This is intentionally vague. The “needs of working mathematicians” fluctuate over time and place and person. Some of the core needs would include the existence of the sets of natural numbers and real numbers, for instance. On the other hand, in the field of set theory one will have other needs than in, say, complex analysis. For now I’ll ignore some of the deeper needs of set theorists, and try to focus on the basic stuff you’d need to formalize what goes on in your average graduate school text (to put it vaguely, again).

We will discuss two formalizations of set theory: ZFC, and Lawvere’s Elementary Theory of the Category of Sets ETCS?. The first “needs no introduction”, as they say. The second is an autonomous category-based theory, described in detail below, and proposed by Saunders Mac Lane as an alternative approach to “foundations of mathematics” (see his book with Moerdijk). Either formalization provides fully adequate infrastructure to support the naive set theory of working mathematicians, but there are significant conceptual differences between them, centering precisely on how the notion of membership is handled. I’ll start with the more familiar ZFC.

As everyone knows, ZFC formalizes a conception of “set” as collection extensionally determined by the members it contains, and the ZFC axioms ensure a rich supply of ways in which to construct new sets from old (pairings, unions, power sets, etc.). Considering how old and well-developed this theory is, and the plenitude of available accounts, I won’t say much here on its inner development. Instead, I want to pose a question and answer to highlight a key ZFC conception, and which we use to focus our discussion:

Question: “What are the members of sets?”

Answer: “Other sets.”

This may seem innocent enough, but the consequences are quite far-reaching. It says that “membership” is a relation \in from the collection VV of all “sets” to itself. (Speaking at a pre-axiomatic level, a relation from a set XX to a set YY is a subset RX×YR \subseteq X \times Y. So a structure for ZFC set theory consists of a “universe of discourse” VV, together with a collection V×V\in \subseteq V \times V of pairs of elements of VV, called the membership relation.)

Why is this a big deal? A reasonable analogue might be dynamical systems. If XX and Y Y are manifolds, say, then you can study the properties of a given smooth map f:XYf: X \to Y and maybe say interesting things of course, but in the case X=YX = Y, you get the extra bonus that outputs can be fed back in as inputs, and infinite processes are born: you can study periodic orbits, long-term behaviors, and so on, and this leads to some very intricate mathematics, even when X X is a simple manifold like a 2-sphere.

My point is that something analogous is happening in ZFC: we have a (binary) relation \in from V V to itself, and we get a rich “dynamics” and feedback by iterative relational composition of \in with itself, or by composing other derived binary relations from V V to itself. (Perhaps I should recall here, again at a pre-axiomatic level, that the composite SR S \circ R of a relation RX×Y R \subseteq X \times Y and SY×Z S \subseteq Y \times Z is the subset

{(x,z): yY(x,y)R(y,z)S}X×Z \{(x, z): \exists_{y \in Y} (x, y) \in R \wedge (y, z) \in S\} \subseteq X \times Z

A “behavior” then would correspond to an iterated membership chain

(xy)(yz)(zw) (x \in y) \wedge (y \in z) \wedge (z \in w) \wedge \ldots

and there are certain constraints on behavior provided by things like the axiom of foundation (no infinitely long backward evolutions). The deep meaning of the extensionality axiom is that a “set” s s is uniquely specified by the abstract structure of the tree of possible backward evolutions or behaviors starting from the “root set” s s. This gives some intuitive but honest idea of the world of sets according to the ZFC picture: sets are tree-like constructions. The ZFC axioms are very rich, having to do with incredibly powerful operations on trees, and the combinatorial results are extremely complicated.

There are other formulations of ZFC. One is by posets: given any relation RV×V R \subseteq V \times V (never mind one satisfying the ZFC axioms), one can create a reflexive and transitive relation \leq, defined by the first-order formula

ab if and only if xV(x,a)R(x,b)R a \leq b \text{ if and only if } \forall_{x \in V} (x, a) \in R \Rightarrow (x, b) \in R

The “extensionality axiom” for R R can then be formulated as the condition that \leq also be antisymmetric, so that it is a partial ordering on V V. If R= R = \in is the membership relation for a model of ZFC, then this \leq is of course just the usual “subset relation” between elements of V V.

Then, by adding in a suitable “singleton” operator s:VV s: V \to V so that

xy if and only if s(x)={x}y x \in y \text{ if and only if } s(x) = \{x\} \leq y

the rest of the ZFC axioms can be equivalently recast as conditions on the augmented poset structure (V,,s) (V, \leq, s). In fact, Joyal and Moerdijk wrote a slim volume, Algebraic Set Theory, which gives a precise (and for a categorist, attractive) sense in which models of axiomatic frameworks like ZF can be expressed as certain initial algebras [of structure type ] within an ambient category of classes, effectively capturing the “cumulative hierarchy” conception underlying ZFC in categorical fashion.

The structure of a ZFC poset (V,) (V, \leq) is rich and interesting, of course, but in some ways a little odd or inconvenient: e.g., it has a bottom element of course (the “empty set”), but no top (which would run straight into Russell’s paradox). Categorically, there are some cute things to point out about this poset, usually left unsaid; for example, taking “unions” is left adjoint to taking “power sets”:

FXif and only ifFPX. \bigcup F \leq X \text{if and only if} F \leq P X.

In summary: ZFC is an axiomatic theory (in the language of first-order logic with equality), with one basic type V V and one basic predicate \in of binary type V×V V \times V, satisfying a number of axioms. The key philosophic point is that there is no typed distinction between “elements” and “sets”: both are of type V V, and there is a consequent very complicated dynamical “mixing” which results just on the basis of a short list of axioms: enough in principle to found all of present-day mathematics! I think the fact that one gets such great power, so economically, from apparently such slender initial data, is a source of great pride and pleasure among those who uphold the ZFC conception (or that of close kin like NBG) as a gold standard in foundations.

My own reaction is that ZFC is perhaps way too powerful! For example, the fact that \in is an endo-relation makes possible the kind of feedback which can result in things like Russell’s paradox, if one is not careful. Even if one is free from the paradoxes, though, the point remains that ZFC pumps out not only all of mathematics, but all sorts of dross and weird by-products that are of no conceivable interest or relevance to mathematics. One might think, for example, that to understand a model of ZFC, we have to be able to understand which definable pairs (a,b) (a, b) satisfy ab a \in b. So, in principle, we can ask ourselves such otherwise meaningless gibberish as “what in our model and implementation is the set-theoretic intersection of the real number π \pi and Cantor space?” and expect to get a well-defined answer. When you get right down to it, the idea that everything in mathematics (like say the number e e) is a “set” is just plain bizarre, and actually very far removed from the way mathematicians normally think. And yet this is how we are encouraged to think, if we are asked to take ZFC seriously as a foundations.

One might argue that all expressions and theorems of normal mathematics are interpretable or realizable in the single theory ZFC, and that’s really all we ever asked for – the details of the actual implementation (like, ‘what is an ordered pair?’) being generally of little genuine interest to mathematicians (which is why the mathematician in the street who says ZFC is so great usually can’t say with much specificity what ZFC is). But this would seem to demote ZFC foundations, for most mathematicians, to a security blanket – nice to know it’s there, maybe, but otherwise fairly irrelevant to their concerns. But if there really is such a disconnect between how a mathematician thinks of her materials at a fundamental level and how it specifically gets coded up as trees in ZFC, with such huge wads of uninteresting or irrelevant stuff in its midst, we might re-examine just how appropriate ZFC is as “foundations” of our subject, or at least ask ourselves how much of it we usefully retain and how we might eliminate the dross.

We turn now to consider a categorical approach, ETCS. This will require retooling the way we think of mathematical membership. There are three respects in which “membership” or “elementhood” differs here from the way it is handled in ZFC:

  1. “Elements” and “sets” are entities of different types. (Meaning, elements are not themselves presupposed to be sets.)
  2. When we say “element”, we never mean an object considered in isolation; we always consider it relative to the specific “set” it is considered to be a member of. (That is, strictly speaking, the same object is never thought of as “belonging to” two distinct sets – use of such language must be carefully circumscribed.)
  3. We consider not just “elements” in the usual sense, but what are sometimes called “generalized elements”. Civilians call them “functions”. Thus, an element of type XX over a domain of variation U U is fancy terminology for a function f:UXf: U \to X. We will call them functions or “generalized elements”, depending on the intuition we have in mind. A function x:1Xx: 1 \to X corresponds to an ordinary element of XX.

Each of these corresponds to some aspect of normal practice, but taken together they are sufficiently different in how they treat “membership” that they might need some getting used to. The first corresponds to a decision to treat elements of a “set” like \mathbb{R} as ‘urelements’: they are not considered to have elements themselves and are not considered as having any internal structure; they are just atoms. What counts in a mathematical structure then is not what the constituents are ‘like’ themselves, but only how they are interrelated among themselves qua the structure they are considered being part of.

This brings us right to the second point. It corresponds e.g. to a decision never to consider a number like ‘3’ in isolation or as some Platonic essence, but always with respect to an ambient system to which it is bound, as in ‘3 qua natural number’, ‘3 qua rational number’, etc. It is a firm resolve to always honor context-dependence. Naturally, we can in a sense transport ‘3’ from one context to another via a specified function like \mathbb{N} \to \mathbb{Q}, but strictly speaking we’ve then changed the element. This raises interesting questions, like “what if anything plays the role of extensionality?”, or “what does it mean to take the intersection of sets?”. (Globally speaking, in ETCS we don’t – but we can, with a bit of care, speak of the intersection of two “subsets” of a given set. For any real mathematical purpose, this is good enough.)

My own sense is that it may be this second precept which takes the most getting used to – it certainly gives the lie to sometimes-heard accusations that categorical set theory is just a “slavish translation of ZFC into categorical terms”. Clearly, we are witnessing here radical departure from how membership is treated in ZFC. Such unbending commitment to the principle of context-dependence might even be felt to be overkill, a perhaps pedantic exercise in austerity or purity, or that it robs us of some freedom in how we want to manipulate sets. A few quick answers: no, we don’t lose any essential freedoms. Yes, the formal language may seem slightly awkward or stilted at certain points, but the bridges between the naive and formal are mercifully fairly obvious and easily navigated. Lastly, by treating membership not as a global endo-relation on sets, but as local and relative, we effectively eliminate all the extraneous dreck and driftwood which one rightly ignores when examining the mathematics of ZFC.

The third precept is familiar from the way category theorists and logicians have used generalized elements to extend set-theoretic notation, e.g., to chase diagrams in abelian categories, or to describe sheaf semantics of intuitionistic set theory, or to flesh out the Curry-Howard isomorphism. It is a technical move in some sense, but one which is easy to grow accustomed to, and very convenient. In ETCS, there is a strong “extensionality principle” (technically, the requirement that the terminal object is a generator) which guarantees enough “ordinary” elements x:1X x: 1 \to X to make any distinctions that can sensibly be made, but experience with topos theory suggests that for many applications, it is often convenient to drop or significantly modify that principle. If anything in ETCS? is a nod to traditional set theory, it is such a strong extensionality principle. [The Yoneda principle, which deals with generalized elements, is also an extensionality principle: it says that a set is determined uniquely (to within uniquely specified isomorphism) by its generalized elements.]

Okay, it is probably time to lay out the axioms of ETCS?. The basic data are just those of a category; here, we are going to think of objects as “sets”, and morphisms x:ab x: a \to b as functions or equivalently as “elements of a set b b over a domain of variation a a”. The latter is a mouthful, and it is sometimes convenient to suppress explicit mention of the domain a a, so that “xb x \in b” just means some morphism x x with codomain b b. More on this below. The axioms of ETCS are the axioms of category theory?, plus existence axioms which guarantee enough structure to express and support naive set theory? (under the strictures imposed by precepts 1-3 above). For those who speak the lingo, the axioms below are those of a well-pointed topos? with natural numbers object? and axiom of choice?. (This can be augmented later with a replacement axiom, so as to achieve bi-interpretability with full ZFC.)

Remark: As ETCS breaks the “dynamical” aspects of ZFC, and additionally treats issues of membership in a perhaps unaccustomed manner, its axioms do take longer to state. This should come as no surprise. Actually, we’ve discussed some of them already in other posts on category theory; we will repeat ourselves but make some minor adjustments to reflect normal notational practice of naive set theory, and build bridges between the naive and formal.

Axiom of products. For any two sets a,b a, b, there is a set c c and functions p 1:ca p_1: c \to a, p 2:cb p_2: c \to b, such that given two elements xa,yb x \in a, y \in b over the same domain, there exists a unique element x,yc \langle x, y \rangle \in c over that domain for which

p 1x,y=xp 2x,y=y p_1\langle x, y \rangle = x \qquad p_2\langle x, y \rangle = y

A choice of product c c is usually denoted a×b a \times b. To make a bridge with naive set-theory notation, we suggestively write

a×b:= i{x,y:xa,yb} a \times b :=_i \{\langle x, y \rangle: x \in a, y \in b\}

where the funny equality sign and bracketing notation on the right simply mean that the cartesian product is uniquely defined up to isomorphism by its collection of (generalized) elements, which correspond to pairs of elements, in accordance with the Yoneda principle as explained in the discussion here.

We also assume the existence of an “empty product” or terminal object 1: this is a set with a unique element x1 x \in 1 over any domain.

Axiom of equalizers. For any two functions f,g:ab \displaystyle f, g: a \rightrightarrows b, there exists a function i:ea i: e \to a such that

  1. fi=gi f i = g i,
  2. Given xa x \in a over some domain such that fx=gx f x = g x, there exists a unique xe x' \in e over the same domain such that ix=x i x' = x.

An equalizer i:ea i: e \to a is again defined up to isomorphism by its collection of generalized elements, denoted e:= i{xa:f(x)=g(x)}a e :=_i \{x \in a: f(x) = g(x)\} \hookrightarrow a, again in accordance with the Yoneda principle. Using the last two axioms, we can form pullbacks: given functions f:ac,g:bc f: a \to c, g: b \to c, we can form the set denoted

{x,ya×b:fx=gy} \{\langle x, y \rangle \in a \times b: f x = g y\}

using the product and equalizer indicated by this notation.

Before stating the next axiom, a few important remarks. We recall that a function i:ab i: a \to b is injective if for every x,ya x, y \in a over the same domain, ix=iy i x = i y implies x=y x = y. In that case we think of i:ab i: a \to b as defining a “subset” a a of b b, whose (generalized) elements correspond to those elements zb z \in b which factor (evidently uniquely) through i i. It is in that sense that we say zb z \in b also “belongs to” a subset a a (cf. precept 2). A relation from a a to b b is an injective function or subset ra×b r \hookrightarrow a \times b.

Axiom of power sets. For every set b b there is a choice of power set P(b) P(b) and a relation bb×P(b) \in_b \hookrightarrow b \times P(b), so that for every relation ra×b r \hookrightarrow a \times b, there exists a unique function χ r:aP(b) \chi_r: a \to P(b) such that r r is obtained up to isomorphism as the pullback

{x,ya×b:y bχ r(x)} \{\langle x, y \rangle \in a \times b: y \in_b \chi_r(x)\}

In other words, x,y \langle x, y \rangle belongs to r r if and only if y,χ r(x) \langle y, \chi_r(x) \rangle belongs to b \in_b.

Axiom of strong extensionality. For functions

f,g:ab, f, g: a \to b,

we have f=g f = g if and only if fx=gx f x = g x for all “ordinary” elements x:1a x: 1 \to a.

Axiom of natural number object. There is a set N \mathbf{N}, together with an element 0:1N 0: 1 \to \mathbf{N} and a function s:NN s: \mathbf{N} \to \mathbf{N}, which is initial among sets equipped with such data. That is, given a set a a together with an element x:1a x: 1 \to a and a function f:aa f: a \to a, there exists a unique function h:Na h: \mathbf{N} \to a such that

h(0)=x;hs=fh h(0) = x; \qquad h s = f h

Or, in elementwise notation, h(n+1)=fh(n) h(n+1) = f h(n) for every (generalized) element nN n \in \mathbf{N}, where n+1 n+1 means s(n) s(n). Under strong extensionality, we may drop the qualifier “generalized”.

Before stating the last axiom, we formulate a notion of “surjective” function: f:ab f: a \to b is surjective if for any two functions g,h:bc g, h: b \to c, we have g=h g = h if and only if gf=hf g f = h f. This is dual to the notion of being injective, and under the axiom of strong extensionality, is equivalent to the familiar notion: that f f is surjective if for every element y:1b y: 1 \to b, there exists an element x:1a x: 1 \to a such that fx=y f x = y.

Axiom of choice?. Every surjective function s:ab s: a \to b admits a section, i.e., a function i:ba i: b \to a such that si=1 b s i = 1_b, the identity function.

This completes the list of axioms for ETCS?. I have been at pains to try to describe them in notation which is natural from the standpoint of naive set theory, with the clear implication that any formula of naive set theory is readily translated into the theory ETCS? (provided we pay appropriate attention to our precepts governing membership), and that this theory provides a rigorous foundation for mainstream mathematics.

To make good on this claim, further discussion is called for. First, I have not discussed how classical first-order logic is internalized in this setting (which we would need to do justice to a comprehension or separation scheme?), nor have I discussed the existence or construction of colimits. I plan to take this up later, provided I have the energy for it. Again, the plan would be to stick as closely as possible to naive set-theoretic reasoning. (This might actually be useful: the categorical treatments found in many texts tend to be technical, often involving things like monad theory and Beck’s theorem, which makes it hard for those not expert in category theory to get into. I want to show this need not be the case.)

Also, some sort of justification for the claim that ETCS? “founds” mainstream mathematics is called for. Minimally, one should sketch how the reals are constructed, for instance, and one should include enough “definability theory” to make it plausible that almost all constructions in ordinary mathematics find a natural home in ETCS [which can be a definition of ordinary mathematics? —ed]. What is excluded? Mainly certain parts of set theory, and parts of category theory (ha!) which involve certain parts of set theory, but this is handled by strengthening the theory with more axioms; I particularly have in mind a discussion of the replacement axiom, and perhaps large cardinal axioms. More to come!

  • For those who care, we intend to create a page on fully formal ETCS?.

ETCS : Internalizing the logic

This post is a continuation of the discussion of “the elementary theory of the category of sets” [ETCS] which we had begun last time, here and in the comments which followed. My thanks go to all who commented, for some useful feedback and thought-provoking questions.

Today I’ll describe some of the set-theoretic operations and “internal logic” of ETCS. I have a feeling that some people are going to love this, and some are going to hate it. My main worry is that it will leave some readers bewildered or exasperated, thinking that category theory has an amazing ability to make easy things difficult.

  • An aside: has anyone out there seen the book Mathematics Made Difficult? It’s probably out of print by now, but I recommend checking it out if you ever run into it — it’s a kind of extended in-joke which pokes fun at category theory and abstract methods generally. Some category theorists I know take a dim view of this book; I for my part found certain passages hilarious, in some cases making me laugh out loud for five minutes straight. There are category-theory-based books and articles out there which cry out for parody!

In an attempt to nip my concerns in the bud, let me remind my readers that there are major differences between the way that standard set theories like ZFC treat membership and the way ETCS treats membership, and that differences at such a fundamental level are bound to propagate throughout the theoretical development, and impart a somewhat different character or feel between the theories. The differences may be summarized as follows:

  • Membership in ZFC is a global relation between objects of the same type (sets).

  • Membership in ETCS is a local relation between objects of different types (“generalized” elements or functions, and sets).

Part of what we meant by “local” is that an element per se is always considered relative to a particular set to which it belongs; strictly speaking, as per the discussion last time, the same element is never considered as belonging to two different sets. That is, in ETCS, an (ordinary) element of a set AA is defined to be a morphism x:1Ax : 1 \to A; since the codomain is fixed, the same morphism cannot be an element 1B1 \to B of a different set BB. This implies in particular that in ETCS, there is no meaningful global intersection operation on sets, which in ZFC is defined by:

AB={x:(xA)(xB)}A \cap B = \{ x : (x \in A) \:\wedge\: (x \in B) \}

Instead, in ETCS, what we have is a local intersection operation on subsets AX,BXA \hookrightarrow X, B \hookrightarrow X of a set. But even the word “subset” requires care, because of how we are now treating membership. So let’s back up, and lay out some simple but fundamental definitions of terms as we are now using them.

Given two monomorphisms i:AX,j:BXi : A \to X, j : B \to X, we write iji \subseteq j (ABA \subseteq B if the monos are understood, or A XBA \subseteq_X B if we wish to emphasize this is local to XX) if there is a morphism k:ABk : A \to B such that i=jki = j k. Since jj is monic, there can be at most one such morphism kk; since ii is monic, such kk must be monic as well. We say i,ji, j define the same subset if this kk is an isomorphism. So: subsets of XX are defined to be isomorphism classes of monomorphisms into XX. As a simple exercise, one may show that monos i,ji, j into XX define the same subset if and only if iji \subseteq j and jij \subseteq i. The (reflexive, transitive) relation X\subseteq_X on monomorphisms thus induces a reflexive, transitive, antisymmetric relation, i.e., a partial order on subsets of XX.

Taking some notational liberties, we write AXA \subseteq X to indicate a subset of XX (as isomorphism class of monos). If x:UXx : U \to X is a generalized element, let us say xx is in a subset AXA \subseteq X if it factors (evidently uniquely) through any representative mono i:AXi : A \to X, i.e., if there exists x:UAx' : U \to A such that x=ixx = i x'. Now the intersection of two subsets AXA \subseteq X and BXB \subseteq X is defined to be the subset ABXA \cap B \subseteq X defined by the pullback of any two representative monos i:AX,j:BXi : A \to X, j : B \to X. Following the “Yoneda principle”, it may equivalently be defined up to isomorphism by specifying its generalized elements:

AB:= i{xX:(xisinA)(xisinB)}.A \:\cap\: B :=_i \{ x \in X : (x is in A) \:\wedge\: (x is in B) \}.

Thus, intersection works essentially the same way as in ZFC, only it’s local to subsets of a given set.

While we’re at it, let’s reformulate the power set axiom in this language: it says simply that for each set BB there is a set P(B)P(B) and a subset BB×P(B)\in_B \subseteq B \times P(B), such that for any relation RB×AR \subseteq B \times A, there is a unique “classifying map” χ R:AP(B)\chi_R : A \to P(B) whereby, under 1 B×χ R:B×AB×P(B)1_B \times \chi_R : B \times A \to B \times P(B), we have

R=(1 B×χ R) 1( B).R = (1_B \times \chi_R)^{-1} (\in_B).

The equality is an equality between subsets, and the inverse image on the right is defined by a pullback. In categorical set theory notation,

R={b,aB×A:b Bχ R(a)}.R = \{ \langle b, a \rangle \in B \times A : b \in_B \chi_R(a) \}.

Hence, there are natural bijections

RB×AAP(B)RB×ABP(A)\displaystyle \frac{R \subseteq B \times A}{A \to P(B)} \qquad \frac{R \subseteq B \times A}{B \to P(A)}

between subsets and classifying maps. The subset corresponding to ϕ:AP(B)\phi : A \to P(B) is denoted [ϕ]B×A\left[\phi\right] \subseteq B \times A or [ϕ]A×B\left[\phi\right] \subseteq A \times B, and is called the extension of ϕ\phi.

The set P(1)P(1) plays a particularly important role; it is called the “subset classifier” because subsets AXA \subseteq X are in natural bijection with functions χ:XP(1)\chi : X \to P(1). [Cf. classifying spaces in the theory of fiber bundles.]

In ordinary set theory, the role of P(1)P(1) is played by the 2-element set {f,t}\{ f, t \}. Here subsets AXA \subseteq X are classified by their characteristic functions χ A:X{f,t}\chi_A : X \to \{ f, t \}, defined by χ A(x):=t\chi_A(x) := t iff xAx \in A. We thus have A=χ A 1(t)A = \chi_A^{-1}(t); the elementhood relation 11×P(1)\in_1 \hookrightarrow 1 \times P(1) boils down to t:1P(1)t : 1 \to P(1). Something similar happens in ETCS set theory:

Lemma 1. The domain of elementhood 11×P(1)P(1)\in_1 \to 1 \times P(1) \cong P(1) is terminal.

Proof. A map X 1X \to \in_1, that is, a map χ:XP(1)\chi : X \to P(1) which is in 1P(1)\in_1 \subseteq P(1), corresponds exactly to a subset χ 1( 1)X\chi^{-1}(\in_1) \subseteq X which contains all of XX (_i.e._, the subobject 1 X:XX1_X : X \subseteq X). Since the only such subset is 1 X1_X, there is exactly one map X 1X \to \in_1. \Box

Hence elementhood 11×P(1)\in_1 \subseteq 1 \times P(1) is given by an element t:1P(1)t : 1 \to P(1). The power set axiom says that a subset AXA \subseteq X is retrieved from its classifying map χ A:XP(1)\chi_A : X \to P(1) as the pullback χ A 1(t)X\chi^{-1}_A (t) \subseteq X.

Part of the power of, well, power sets is in a certain dialectic between external operations on subsets and internal operations on P(1)P(1); one can do some rather amazing things with this. The intuitive (and pre-axiomatic) point is that if CC has finite products, equalizers, and power objects, then P(1)P(1) is a representing object for the functor

Sub:C opSetSub : C^{op} \to Set

which maps an object XX to the collection of subobjects of XX, and which maps a morphism (“function”) f:XYf : X \to Y to the “inverse image” function f 1:Sub(Y)Sub(X)f^{-1} : Sub(Y) \to Sub(X), that sends a subset j:BYj : B \subseteq Y to the subset f 1(B)Xf^{-1}(B) \subseteq X given by the pullback of the arrows f:XYf : X \to Y, j:BY j : B \subseteq Y. By the Yoneda lemma, this representability means that external natural operations on the Sub(X)Sub(X) correspond to internal operations on the object P(1)P(1). As we will see, one can play off the external and internal points of view against each other to build up a considerable amount of logical structure, enough for just about any mathematical purpose.

  • Remark. A category satisfying just the first three axioms of ETCS, namely existence of finite products, equalizers, and power objects, is called an (elementary) topos. Most or perhaps all of this post will use just those axioms, so we are really doing some elementary topos theory. As I was just saying, we can build up a tremendous amount of logic internally within a topos, but there’s a catch: this logic will be in general intuitionistic. One gets classical logic (including law of the excluded middle) if one assumes strong extensionality [where we get the definition of a well-pointed topos]. Topos theory has a somewhat fearsome reputation, unfortunately; I’m hoping these notes will help alleviate some of the sting.

To continue this train of thought: by the Yoneda lemma, the representing isomorphism

θ:hom(,P(1))Sub()\displaystyle \theta : \hom(-, P(1)) \stackrel{\sim}{\to} Sub(-)

is determined by a universal element θ P(1)(1 P(1))\theta_{P(1)}(1_{P(1)}), i.e., a subset of P(1)P(1), namely the mono t:1P(1)t : 1 \to P(1). In that sense, t:1P(1)t : 1 \to P(1) plays the role of a universal subset. The Yoneda lemma implies that external natural operations on general posets Sub(X)Sub(X) are completely determined by how they work on the universal subset.

Internal Meets

To illustrate these ideas, let us consider intersection. Externally, the intersection operation is a natural transformation

X:Sub(X)×Sub(X)Sub(X).\cap_X : Sub(X) \times Sub(X) \to Sub(X).

This corresponds to a natural transformation

hom(X,P(1))×hom(X,P(1))hom(X,P(1))\hom(X, P(1)) \times \hom(X, P(1)) \to \hom(X, P(1))

which (by Yoneda) is given by a function :P(1)×P(1)P(1)\wedge : P(1) \times P(1) \to P(1). Working through the details, this function is obtained by putting X=P(1)×P(1)X = P(1) \times P(1) and chasing

π 1,π 2hom(P(1)×P(1),P(1))×hom(P(1)×P(1),P(1))\langle \pi_1, \pi_2 \rangle \in \hom(P(1) \times P(1), P(1)) \times \hom(P(1) \times P(1), P(1))

through the composite

hom(X,P(1))×hom(X,P(1))Sub(X)×Sub(X)Sub(X)hom(X,P(1)).\hom(X, P(1)) \times \hom(X, P(1)) \stackrel{\sim}{\to} Sub(X) \times Sub(X) \stackrel{\cap}{\to} Sub(X) \stackrel{\sim}{\to} \hom(X, P(1)).

Let’s analyze this bit by bit. The subset [π 1]=π 1 1(t)P(1)×P(1)\left[\pi_1\right] = \pi_{1}^{-1}(t) \subseteq P(1) \times P(1) is given by

t×id:1×P(1)P(1)×P(1),t \times id : 1 \times P(1) \to P(1) \times P(1),

and the subset [π 2]=π 2 1(t)P(1)×P(1)\left[\pi_2\right] = \pi_{2}^{-1}(t) \subseteq P(1) \times P(1) is given by

id×t:P(1)×1P(1)×P(1).id \times t : P(1) \times 1 \to P(1) \times P(1).

Hence [π 1][π 2]P(1)×P(1)\left[\pi_1\right] \cap \left[\pi_2\right] \subseteq P(1) \times P(1) is given by the pullback of the functions t×idt \times id and id×tid \times t, which is just

t×t:1×1P(1)×P(1).t \times t : 1 \times 1 \to P(1) \times P(1).

The map :P(1)×P(1)P(1)\wedge : P(1) \times P(1) \to P(1) is thus defined to be the classifying map of t×t:1×1P(1)×P(1)t \times t : 1 \times 1 \subseteq P(1) \times P(1).

To go from the internal meet :P(1)×P(1)P(1)\wedge : P(1) \times P(1) \to P(1) back to the external intersection operation, let AX,BXA \subseteq X, B \subseteq X be two subsets, with classifying maps χ A,χ B:XP(1)\chi_A, \chi_B : X \to P(1). By the definition of \wedge, we have that for all generalized elements xXx \in X

χ A(x)χ B(x)=tifandonlyifχ A(x),χ B(x)=t,t\chi_A(x) \:\wedge\: \chi_B(x) = t \quad if and only if \quad \langle \chi_A(x), \chi_B(x) \rangle = \langle t, t \rangle

(where the equality signs are interpreted with the help of equalizers). This holds true iff xx is in the subset AXA \subseteq X and is in the subset BXB \subseteq X, i.e., if and only if xx is in the subset ABXA \cap B \subseteq X. Thus χ Aχ B\chi_A \wedge \chi_B is indeed the classifying map of ABXA \cap B \subseteq X. In other words, χ AB=χ Aχ B\chi_{A \cap B} = \chi_A \wedge \chi_B.

A by-product of the interplay between the internal and external is that the internal intersection operator

:P(1)×P(1)P(1)\wedge : P(1) \times P(1) \to P(1)

is the meet operator of an internal meet-semilattice structure on P(1)P(1): it is commutative, associative, and idempotent (because that is true of external intersection). The identity element for \wedge is the element t:1P(1)t : 1 \to P(1). In particular, P(1)P(1) carries an internal poset structure: given generalized elements u,v:AP(1)u, v : A \to P(1), we may define

uvifandonlyifu=uvu \leq v \quad if and only if \quad u = u \wedge v

and this defines a reflexive, symmetric, antisymmetric relation []P(1)×P(1)\left[\leq\right] \subseteq P(1) \times P(1):

[]:= i{u,vP(1)×P(1):u=uv},\left[\leq\right] :=_i \{ \langle u, v \rangle \in P(1) \times P(1) : u = u \wedge v\},

equivalently described as the equalizer

[]P(1)×P(1)P(1)\left[\leq\right] \to P(1) \times P(1) \stackrel{\to}{\to} P(1)

of the maps π 1:P(1)×P(1)P(1)\pi_1 : P(1) \times P(1) \to P(1) and :P(1)×P(1)P(1)\wedge : P(1) \times P(1) \to P(1). We have that uvu \leq v if and only if [u][v]\left[u\right] \subseteq \left[v\right].

Internal Implication

Here we begin to see some of the amazing power of the interplay between internal and external logical operations. We will prove that P(1)P(1) carries an internal Heyting algebra structure (ignoring joins for the time being).

Let’s recall the notion of Heyting algebra in ordinary naive set-theoretic terms: it’s a lattice PP that has a material implication operator “\Rightarrow” such that, for all x,y,zPx, y, z \in P,

xyzifandonlyifxyz.x \wedge y \leq z \quad if and only \quad if x \leq y \Rightarrow z.

Now: by the universal property of P(1)P(1), a putative implication operation :P(1)×P(1)P(1)\Rightarrow : P(1) \times P(1) \to P(1) is uniquely determined as the classifying map of its inverse image () 1(t)P(1)×P(1)(\Rightarrow)^{-1}(t) \subseteq P(1) \times P(1), whose collection of generalized elements is

{u,vP(1)×P(1):(uv)=t}\{ \langle u, v \rangle \in P(1) \times P(1) : (u \Rightarrow v) = t\}

Given u,v:AP(1)×P(1)\langle u, v \rangle : A \to P(1) \times P(1), the equality here is equivalent to

tuvt \leq u \Rightarrow v

(because t:1P(1)t : 1 \to P(1) is maximal), which in turn is equivalent to

tu=uvt \wedge u = u \leq v

This means we should define :P(1)×P(1)P(1)\Rightarrow : P(1) \times P(1) \to P(1) to be the classifying map of the subset

[]P(1)×P(1).\left[\leq\right] \subseteq P(1) \times P(1).

Theorem 1. P(1)P(1) admits internal implication.

Proof. We must check that for any three generalized elements u,v,w:AP(1)u, v, w : A \to P(1), we have

wuvifandonlyifwuv.w \leq u \Rightarrow v \quad if and only \quad if w \wedge u \leq v.

Passing to the external picture, let [u],[v],[w]\left[u\right], \left[v\right], \left[w\right] be the corresponding subsets of AA. Now: according to how we defined ,\Rightarrow, a generalized element eAe \in A is in [uv]\left[u \Rightarrow v\right] if and only if u(e)v(e)u(e) \leq v(e). This applies in particular to any monomorphism e:[w]Ae: \left[w\right] \to A that represents the subset [w]A \left[w\right] \subseteq A.

Lemma 2. The composite

u(e)=([w]eAuP(1))\displaystyle u(e) = (\left[w\right] \stackrel{e}{\to} A \stackrel{u}{\to} P(1))

is the classifying map of the subset [w][u][w]\left[w\right] \cap \left[u\right] \subseteq \left[w\right].

Proof. As subsets of [w]\left[w\right],

(ue) 1(t)=e 1u 1(t)=e 1([u])=[w][u](u e)^{-1}(t) = e^{-1} u^{-1}(t) = e^{-1}(\left[u\right]) = \left[w\right] \cap \left[u\right]

where the last equation holds because both sides are the subsets defined as the pullback of two representative monos e:[w]Ae : \left[w\right] \to A, i:[u]Ai : \left[u\right] \to A. \Box

Continuing the proof of Theorem 1, we see by Lemma 2 that the condition u(e)v(e)u(e) \leq v(e) corresponds externally to the condition

[w][u][w][v]\left[w\right] \cap \left[u\right] \subseteq \left[w\right] \cap \left[v\right]

and this condition is equivalent to [w][u][v]\left[w\right] \cap \left[u\right] \subseteq \left[v\right].

Passing back to the internal picture, this is equivalent to wuvw \wedge u \leq v, and the proof of Theorem 1 is complete. \Box

Cartesian Closed Structure

Next we address a comment made by “James”, that a category satisfying the ETCS axioms is cartesian closed. As everything else in this article, this uses only the fact that such a category is a topos: has finite products, equalizers, and “power sets”.

Proposition 1. If A,BA, B are “sets”, then P(A×B)P(A \times B) represents an exponential P(B) A.P(B)^A.

Proof. By the power set axiom, there is a bijection between maps into the power set and relations:

ϕ:XP(A×B)RX×(A×B)\displaystyle \frac{\phi : X \to P(A \times B)}{R \subseteq X \times (A \times B)}

which is natural in XX. By the same token, there is a natural bijection

R(X×A)×Bϕ:X×AP(B).\displaystyle \frac{R \subseteq (X \times A) \times B}{\phi' : X \times A \to P(B)}.

Putting these together, we have a natural isomorphism hom(,P(A×B))hom(×A,P(B))\hom(-, P(A \times B)) \cong \hom(- \times A, P(B))

and this representability means precisely that P(A×B)P(A \times B) plays the role of an exponential P(B) AP(B)^A. \Box

Corollary 1. P(A)P(1) AP(A) \cong P(1)^A. \Box

The universal element of this representation is an evaluation map A×P(A)P(1)A \times P(A) \to P(1), which is just the classifying map of the subset AA×P(A)\in_A \subseteq A \times P(A).

Thus, P(B) AP(A×B)P(B)^A \cong P(A \times B) represents the set of all functions ϕ:AP(B)\phi : A \to P(B) (that is, relations from AA to B B). This is all we need to continue the discussion of internal logic in this post, but let’s also sketch how we get full cartesian closure. [Warning: for those who are not comfortable with categorical reasoning, this sketch could be rough going in places.]

As per our discussion, we want to internalize the set of such relations which are graphs of functions, i.e., maps ϕ\phi where each ϕ(a)B\phi(a) \subseteq B is a singleton, in other words which factor as

ABσP(B)\displaystyle A \to B \stackrel{\sigma}{\to} P(B)

where σ:BP(B)\sigma : B \to P(B) is the singleton mapping:

b{b}={cB:b=c}.b \mapsto \{ b \} = \{ c \in B: b = c \}.

We see from this set-theoretic description that σ:BP(B)\sigma : B \to P(B) classifies the equality relation

{b,cB×B:b=c}B×B\{ \langle b, c \rangle \in B \times B: b = c \} \subseteq B \times B

which we can think of as either the equalizer of the pair of maps π 1,π 2:B×BB\pi_1, \pi_2 : B \times B \to B or, what is the same, the diagonal map δ B=1 B,1 B:BB×B\delta_B = \langle 1_B, 1_B \rangle : B \to B \times B.

Thus, we put σ=χ δ:BP(B)\sigma = \chi_{\delta} : B \to P(B), and it is not too hard to show that the singleton mapping σ\sigma is a monomorphism. As usual, we get this monomorphism as the pullback χ σ 1(t)\chi_{\sigma}^{-1}(t) of t:1P(1)t : 1 \to P(1) along its classifying map χ σ:P(B)P(1)\chi_{\sigma} : P(B) \to P(1).

Now: a right adjoint such as () A (-)^A preserves all limits, and in particular pullbacks, so we ought then to have a pullback

B A 1 A σ A t A P(B) A χ σ A P(1) A \begin{matrix} B^{\mathrlap{A}} & \longrightarrow & 1^{\mathrlap{A}} \\ \mathllap{\scriptsize{\sigma^A}}\downarrow & & \downarrow\mathrlap{\scriptsize{t^\A}} \\ P(B)^{\mathrlap{A}} & \underset{\chi_\sigma^A}{\longrightarrow} & P(1)^{\mathrlap{A}} \end{matrix}

Of course, we don’t even have B AB^A yet, but this should give us an idea: define σ A\sigma^A, and in particular its domain B AB^A, by taking the pullback of the right-hand map along the bottom map. In case there is doubt, the map on the bottom is defined Yoneda-wise, applying the isomorphism

hom(P(B) A×A,P(1))hom(P(B) A,P(1) A)\hom(P(B)^A \times A, P(1)) \cong \hom(P(B)^A, P(1)^A)

to the element in the hom-set (on the left) given by the composite

P(B) A×AevP(B)χ σP(1).\displaystyle P(B)^A \times A \stackrel{ev}{\to} P(B) \stackrel{\chi_\sigma}{\to} P(1).

The map on the right of the pullback is defined similarly. That this recipe really gives a construction of B AB^A will be left as an exercise for the reader.

Universal Quantification

As further evidence of the power of the internal-external dialectic, we show how to internalize universal quantification?.

As we are dealing here now with predicate logic, let’s begin by defining some terms as to be used in ETCS and topos theory:

  • An ordinary predicate of type AA is a function ϕ:AP(1)\phi : A \to P(1). Alternatively, it is an ordinary element ϕ:1P(1) AP(A)\phi' : 1 \to P(1)^A \cong P(A). It corresponds (naturally and bijectively) to a subset [ϕ]:SA\left[\phi\right]: S \subseteq A.

  • A generalized predicate of type AA is a function ϕ:XP(A)P(1) A\phi' : X \to P(A) \cong P(1)^A. It may be identified with (corresponds naturally and bijectively to) a function ϕ:X×AP(1)\phi : X \times A \to P(1), or to a subset [ϕ]:SX×A\left[\phi\right] : S \subseteq X \times A.

We are trying to define an operator A\forall_A which will take a predicate of the form ϕ:X×AP(1)\phi : X \times A \to P(1) [conventionally written ] to a predicate Aϕ:XP(1)\forall_A \phi : X \to P(1) [conventionally written ]. Externally, this corresponds to a natural operation which takes subsets of X×AX \times A to subsets of XX. Internally, it corresponds to an operation of the form:

A:P(A)P(1) AP(1).\forall_A : P(A) \cong P(1)^A \to P(1).

This function is determined by the subset ( A) 1(t)P(1) A(\forall_A)^{-1}(t) \subseteq P(1)^A, defined elementwise by

{ϕP(1) A: Aϕ=t}.\{ \phi \in P(1)^A : \forall_A \phi = t\}.

Now, in ordinary logic, aAϕ(a)\forall_{a \in A} \phi(a) is true if and only if ϕ(a)\phi(a) is true for all aAa \in A, or, in slightly different words, if ϕ:AP(1)\phi : A \to P(1) is constantly true over all of AA:

ϕ=(A!1tP(1)).\displaystyle \phi = (A \stackrel{!}{\to} 1 \stackrel{t}{\to} P(1)).

The expression on the right (global truth over AA) corresponds to a function t A:1P(1) At_A : 1 \to P(1)^A, indeed a monomorphism since any function with domain 11 is monic. Thus we are led to define the desired quantification operator A:P(1) AP(1)\forall_A : P(1)^A \to P(1) as the classifying map of t A:1P(1) At_A : 1 \subseteq P(1)^A.

Let’s check how this works externally. Let ϕ:XP(1) A\phi : X \to P(1)^A be a generalized predicate of type AA. Then according to how A\forall_A has just been defined, Aϕ:XP(1)\forall_A \phi : X \to P(1) classifies the subset

{xX:ϕ(x,)=t A:AP(1))}X\displaystyle \{ x \in X: \phi(x, -) = t_A : A \to P(1)) \} \subseteq X

There is an interesting adjoint relationship between universal quantification and substitution (aka “pulling back”). By “substitution”, we mean that given any predicate ψ:XP(1)\psi : X \to P(1) on XX, we can always pull back to a predicate on X×AX \times A (substituting in a dummy variable aa of type AA, forming e.g. ψ(x)[a=a]\psi(x) \wedge \left[a=a\right]) by composing with the projection π:X×AX\pi : X \times A \to X. In terms of subsets, substitution along AA is the natural external operation

([ψ]X)([ψ]×AX×A).(\left[\psi\right] \subseteq X) \mapsto (\left[\psi\right]\times A \subseteq X \times A).

Then, for any predicate ϕ:X×AP(1)\phi : X \times A \to P(1), we have the adjoint relationship

[ψ]×Aϕifandonlyif[ψ] Aϕ\left[\psi\right] \times A \subseteq \phi \quad if and only if \quad \left[\psi\right] \subseteq \forall_A \phi

so that substitution along AA is left adjoint to universal quantification along AA. This is easy to check; I’ll leave that to the reader.

Internal Intersection Operators

Now we put all of the above together, to define an internal intersection operator

:PPXPX\bigcap : PPX \to PX

which intuitively takes an element 1PPX1 \to PPX (a family FF of subsets of XX) to its intersection 1PX1 \to PX, as a subset FX\bigcap F \subseteq X.

Let’s first write out a logical formula which expresses intersection:

xFifandonlyif SPX(SF)(xS).x \in \bigcap F \quad if and only if \quad \forall_{S \in PX} (S \in F) \Rightarrow (x \in S).

Jon Awbrey?: There seemed to be an orphan right bracket in the above line, also on your blog.

We have all the ingredients to deal with the logical formula on the right: we have an implication operator “\Rightarrow” as part of the internal Heyting algebra structure on P(1)P(1), and we have the quantification operator “ PX\forall_{PX}”. The atomic expressions (SF)(S \in F) and (xS)(x \in S) refer to internal elementhood: (xS)(x \in S) means x,SX×PX\langle x, S \rangle \in X \times PX is in XX×PX\in_{X} \subseteq X \times PX, and (SF)(S \in F) means S,FPX×PPX\langle S, F \rangle \in PX \times PPX is in PXPX×PPX\in_{PX} \subseteq PX \times PPX.

Jon Awbrey?: I didn’t know what to do with the extra slashes in the above paragraph that weren’t parsing, so I deleted them.

There is a slight catch, in that the predicates “(S PXF)(S \in_{PX} F)” and “(x XS)(x \in_X S)” (as generalized predicates over PXPX, where SS lives) are taken over different domains. The first is of the form ϕ 1:PPXP(1) PX\phi_1 : PPX \to P(1)^{PX}, and the second is of the form ϕ 2:XP(1) PX\phi_2 : X \to P(1)^{PX}. No matter: we just substitute in some dummy variables. That is, we just pull these maps back to a common domain PPX×XPPX \times X, forming the composites

ψ 1=(PPX×Xπ 1PPXϕ 1P(1) PX)\displaystyle \psi_1 = (PPX \times X \stackrel{\pi_1}{\to} PPX \stackrel{\phi_1}{\to} P(1)^{PX})

and

ψ 2=(PPX×Xπ 2Xϕ 2P(1) PX).\displaystyle \psi_2 = (PPX \times X \stackrel{\pi_2}{\to} X \stackrel{\phi_2}{\to} P(1)^{PX}).

Putting all this together, we form the composite

PPX×Xψ 1,ψ 2P(1) PX×P(1) PX\displaystyle PPX \times X \stackrel{\langle \psi_1, \psi_2 \rangle}{\to} P(1)^{PX} \times P(1)^{PX}
(P(1)×P(1)) PX() PXP(1) PX PXP(1)\displaystyle \cong (P(1) \times P(1))^{PX} \stackrel{(\Rightarrow)^{PX}}{\to} P(1)^{PX} \stackrel{\forall_{PX}}{\to} P(1)

This composite directly expresses the definition of the internal predicate (xF)(x \in \bigcap F) given above. By cartesian closure, this map PPX×XP(1)PPX \times X \to P(1) induces the desired internal intersection operator, :PPXPX\bigcap : PPX \to PX.

This construction provides an important bridge to getting the rest of the internal logic of ETCS. Since we can can construct the intersection of arbitrary definable families of subsets, the power sets PXPX are internal inf-lattices. But inf-lattices are sup-lattices as well; on this basis we will be able to construct the colimits ( e.g., finite sums, coequalizers) that we need. Similarly, the intersection operators easily allow us to construct image factorizations: any function f:XYf : X \to Y can be factored (in an essentially unique way) as an epi or surjection XIX \to I to the image, followed by a mono or injection IYI \to Y. The trick is to define the image as the smallest subset of YY through which ff factors, by taking the intersection of all such subsets. Image factorization leads in turn to the construction of existential quantification?.

As remarked above, the internal logic of a topos is generally intuitionistic (the law of excluded middle is not satisfied). But, if we add in the axiom of strong extensionality of ETCS, then we’re back to ordinary classical logic, where the law of excluded middle is satisfied, and where we just have the two truth values “true” and “false”. This means we will be able to reason in ETCS set theory just as we do in ordinary mathematics, taking just a bit of care with how we treat membership. The foregoing discussion gives indication that logical operations in categorical set theory work in ways familiar from naive set theory, and that basic set-theoretic constructions like intersection are well-grounded in ETCS.


ETCS : Building joins and coproducts

After a long hiatus, I’d like to renew the discussion of axiomatic categorical set theory, more specifically the Elementary Theory of the Category of Sets (ETCS?). Last time I blogged about this, I made some initial forays into “internalizing logic” in ETCS?, and described in broad brushstrokes how to use that internal logic to derive a certain amount of the structure one associates with a category of sets. Today I’d like to begin applying some of the results obtained there to the problem of constructing colimits in a category satisfying the ETCS axioms (an ETCS? category, for short).

(If you’re just joining us now, and you already know some of the jargon, an ETCS? category is a well-pointed topos that satisfies the axiom of choice and with a natural numbers object. We are trying to build up some of the elementary theory of such categories from scratch, with a view toward foundations of mathematics.)

But let’s see — where were we? Since it’s been a while, I was tempted to review the philosophy behind this undertaking (why one would go to all the trouble of setting up a categories-based alternative to ZFC, when time-tested ZFC is able to express virtually all of present-day mathematics on the basis of a reasonably short list of axioms?). But in the interest of time and space, I’ll confine myself to a few remarks.

As we said, a chief difference between ZFC and ETCS resides in how ETCS treats the issue of membership. In ZFC, membership is a global binary relation: we can take any two “sets” A,BA, B and ask whether ABA \in B. Whereas in ETCS, membership is a relation between entities of different sorts: we have “sets” on one side and “elements” on another, and the two are not mixed ( e.g., elements are not themselves considered sets).

Further, and far more radical: in ETCS the membership relation xAx \in A is a function, that is, an element xx “belongs” to only one set AA at a time. We can think of this as “declaring” how we are thinking of an element, that is, declaring which set (or which type) an element is being considered as belonging to. (In the jargon, ETCS is a typed theory.) This reflects a general and useful philosophic principle: that elements in isolation are considered inessential, that what counts are the aggregates or contexts in which elements are organized and interrelated. For instance, the numeral “2” in isolation has no meaning; what counts is the context in which we think of it (_qua_ rational number or qua complex number, etc.). Similarly the set of real numbers has no real sense in isolation; what counts is which category we view it in.

I believe it is reasonable to grant this principle a foundational status, but: rigorous adherence to this principle completely changes the face of what set theory looks like. If elements “belong” only to one set at a time, how then do we even define such basic concepts as subsets and intersections? These are some of these issues we discussed last time.

There are other significant differences between ZFC and ETCS: stylistically, or in terms of presentation, ZFC is more “top-down” and ETCS is more “bottom-up”. For example, in ZFC, one can pretty much define a subset {xX:P}\{ x \in X : P \} by writing down a first-order formula PP in the language; the comprehension? (or separation?) axiom scheme is a mighty sledgehammer that takes care of the rest. In the axioms of ETCS, there is no such sledgehammer: the closest thing one has to a comprehension scheme in the ETCS axioms is the power set axiom (a single axiom, not an axiom scheme). However, in the formal development of ETCS, one derives a comprehension scheme as one manually constructs the internal logic, in stages, using the simple tools of adjunctions and universal properties. We started doing some of that in our last post. So: with ZFC it’s more as if you can just hop in the car and go; with ETCS you build the car engine from smaller parts with your bare hands, but in the process you become an expert mechanic, and are not so rigidly attached to a particular make and model ( e.g., much of the theory is built just on the axioms of a topos, which allows a lot more semantic leeway than one has with ZF).

But, in all fairness, that is perhaps the biggest obstacle to learning ETCS: at the outset, the tools available [mainly, the idea of a universal property] are quite simple but parsimonious, and one has to learn how to build some set-theoretic and logical concepts normally taken as “obvious” from the ground up. (Talk about “foundations”!) On the plus side, by building big logical machines from scratch, one gains a great deal of insight into the inner workings of logic, with a corresponding gain in precision and control and modularity when one would like to use these developments to design, say, automated deduction systems (where there tend to be strong advantages to using type-theoretic frameworks).

Enough philosophy for now; readers may refer to my earlier posts for more. Let’s get to work, shall we? Our last post was about the structure of (and relationships between) posets of subobjects Sub(X)Sub(X) relative to objects XX, and now we want to exploit the results there to build some absolute constructions, in particular finite coproducts and coequalizers. In this post we will focus on coproducts.

Note to the experts. Most textbook treatments of the formal development of topos theory (as for example Mac Lane–Moerdijk) are efficient but highly technical, involving for instance the slice theorem for toposes and, in the construction of colimits, recourse to Beck’s theorem in monad theory applied to the double power-set monad [following the elegant construction of Paré]. The very abstract nature of this style of argumentation (which in the application of Beck’s theorem expresses ideas of fourth-order set theory and higher) is no doubt partly responsible for the somewhat fearsome reputation of topos theory.

In these notes I take a much less efficient but much more elementary approach, based on an arrangement of ideas which I hope can be seen as “natural” from the point of view of naive set theory. I learned of this approach from Myles Tierney, who was my PhD supervisor, and who with Bill Lawvere co-founded elementary topos theory, but I am not aware of any place where the details of this approach have been written up before now. I should also mention that the approach taken here is not as “purist” as many topos theorists might want; for example, here and there I take advantage of the strong extensionality axiom of ETCS to simplify some arguments.

The Empty Set and Two-Valued Logic

We begin with the easy observation that a terminal category, i.e., a category 1\mathbf{1} with just one object and one morphism (the identity), satisfies all the ETCS axioms. Ditto for any category CC equivalent to 1\mathbf{1} (where every object is terminal). Such boring ETCS categories are called degenerate; obviously our interest is in the structure of nondegenerate ETCS categories.

Let E\mathbf{E} be an ETCS category (see here for the ETCS axioms). Objects of E\mathbf{E} are generally called “sets”, and morphisms are generally called “functions” or “maps”.

Proposition 0. If an ETCS category E\mathbf{E} is a preorder, then E\mathbf{E} is degenerate.

Proof. Recall that a preorder is a category in which there is at most one morphism ABA \to B for any two objects A,BA, B. Every morphism in a preorder is vacuously monic. If there is a nonterminal set AA, then the monic A1A \to 1 to any terminal set defines a subset A1A \subseteq 1 distinct from the subset defined by 111 \to 1, thus giving (in an ETCS category) distinct classifying maps χ A,t:1P(1)\chi_A, t : 1 \to P(1), contradicting the preorder assumption. Therefore all objects AA are terminal. \Box

Assume from now on that E\mathbf{E} is a nondegenerate ETCS category.

Proposition 1. There are at least two truth values, i.e., two elements 1P(1)1 \to P(1), in E\mathbf{E}.

Proof. By proposition 0, there exist sets X,YX, Y and two distinct functions f,g:XYf, g : X \to Y. By the axiom of strong extensionality, there exists x:1Xx : 1 \to X such that fxgxf x \neq g x. The equalizer E1E \to 1 of the pair fx,gx:1Yf x, g x : 1 \to Y is then a proper subset of 11, and therefore there are at least two distinct elements χ E,χ 1:1P(1)\chi_E, \chi_1 : 1 \to P(1). \Box

Proposition 2. There are at most two truth values 1P(1)1 \to P(1); equivalently, there are at most two subsets of 11.

Proof. If U,V1U, V \subseteq 1 are distinct subsets of 11, then either UUVU \neq U \cap V or VUVV \neq U \cap V, say the former. Then 1 U:UU1_U : U \subseteq U and UVUU \cap V \subseteq U are distinct subsets, with distinct classifying maps χ 1 U,χ UV:UP(1)\chi_{1_U}, \chi_{U \cap V} : U \to P(1). By strong extensionality, there exists x:1Ux : 1 \to U distinguishing these classifying maps. Because 11 is terminal, we then infer 1U1 \subseteq U and U1U \subseteq 1, so U=1U = 1 as subsets of 11, and in that case only VV can be a proper subset of 11. \Box

By Propositions 1 and 2, there is a unique proper subset of the terminal object 11. Let 010 \to 1 denote this subset. Its domain may be called an “empty set”; by the preceding proposition, it has no proper subsets. The classifying map 1P11 \to P1 of 010 \subseteq 1 is the truth value we call “false”.

Proposition 3. 0 is an initial object, i.e., for any XX there exists a unique function 0X0 \to X.

Proof. Uniqueness: if f,g:0Xf, g : 0 \to X are maps, then their equalizer x:E0x : E \to 0, which is monic, must be an isomorphism since 0 has no proper subsets. Therefore f=gf = g. Existence: there are monos

01t XPX\displaystyle 0 \to 1 \stackrel{t_X}{\to} PX
XσPX\displaystyle X \stackrel{\sigma}{\to} PX

where t Xt_X is “global truth” (classifying the subset XXX \subseteq X) on XX and σ\sigma is the “singleton mapping x{x}x \mapsto \{ x \}” on XX, defined as the classifying map of the diagonal map δ:XX×X\delta : X \subseteq X \times X (last time we saw σ\sigma is monic). Take their pullback. The component of the pullback parallel to σ\sigma is a mono P0P \to 0 which again is an isomorphism, whence we get a map 0PX0 \cong P \to X using the other component of the pullback. \Box

Remark. For the “purists”, an alternative construction of the initial set 0 that avoids use of the strong extensionality axiom is to define the subset 010 \subseteq 1 to be “the intersection all subsets of 11”. Formally, one takes the extension [ϕ]1\left[\phi\right] \subseteq 1 of the map

ϕ=(1t P1PP1P1);\displaystyle \phi = (1 \stackrel{t_{P1}}{\to} PP1 \stackrel{\bigcap}{\to} P1);

where the first arrow represents the class of all subsets of P1P1, and the second is the internal intersection operator defined at the end of our last post. Using formal properties of intersection developed later, this intersection 010 \subseteq 1 has no proper subsets, and then the proof of proposition 3 carries over verbatim. \Box

Corollary 1. For any XX, the set 0×X0 \times X is initial.

Proof. By cartesian closure, maps 0×XY0 \times X \to Y are in bijection with maps of the form 0Y X0 \to Y^X, and there is exactly one of these since 0 is initial. \Box

Corollary 2. If there exists f:X0f : X \to 0, then XX is initial.

Proof. The composite of f,1 X:X0×X\langle f, 1_X \rangle : X \to 0 \times X followed by π 2:0×XX\pi_2 : 0 \times X \to X is 1 X1_X, and π 2\pi_2 followed by f,1 X:X0×X\langle f, 1_X \rangle : X \to 0 \times X is also an identity since 0×X0 \times X is initial by Corollary 1. Hence XX is isomorphic to an initial object 0×X0 \times X. \Box

By Corollary 2, for any object YY the arrow 0Y0 \to Y is vacuously monic, hence defines a subset.

Proposition 4. If X¬0X \not\cong 0, then there exists an element x:1Xx : 1 \to X.

Proof. Under the assumption, XX has at least two distinct subsets: 0X0 \subseteq X and 1 X:XX1_X : X \subseteq X. By strong extensionality, their classifying maps χ 0,χ 1:XP1\chi_0, \chi_1 : X \to P1 are distinguished by some element x:1Xx : 1 \to X.

External Unions and Internal Joins

One of the major goals in this post is to construct finite coproducts in an ETCS category. As in ordinary set theory, we will construct these as disjoint unions. This means we need to discuss unions first; as should be expected by now, in ETCS unions are considered locally, i.e., we take unions of subsets of a given set. So, let A,BXA, B \subseteq X be subsets.

To define the union ABXA \cup B \subseteq X, the idea is to take the intersection of all subsets containing AA and BB. That is, we apply the internal intersection operator (constructed last time ),

:PPXPX,\displaystyle \bigcap : PPX \to PX,
to the element 1PPX1 \to PPX that represents the set of all subsets of XX containing AA and BB; the resulting element 1PX1 \to PX represents ABA \cup B. The element 1PPX1 \to PPX corresponds to the intersection of two subsets {CPX:AC}{CPX:BC}PX\{ C \in PX : A \subseteq C \} \:\cap\: \{ C \in PX : B \subseteq C \} \subseteq PX.

Remark. Remember that in ETCS we are using generalized elements: CPXC \in PX really means a function C:UPXC : U \to PX over some domain UU, which in turn classifies a subset [C]U×X\left[C\right] \subseteq U \times X. On the other hand, the AA here is a subset AXA \subseteq X. How then do we interpret the condition “ACA \subseteq C”? We first pull back χ A:1PX\chi_A : 1 \to PX over to the domain UU; that is, we form the composite U!1χ APX\displaystyle U \stackrel{!}{\to} 1 \stackrel{\chi_A}{\to} PX and consider the condition that this is bounded above by C:UPXC : U \to PX. (We will write χ AC\chi_A \leq C, thinking of the left side as constant over UU.) Externally, in terms of subsets, this corresponds to the condition U×A[C]U \times A \subseteq \left[C\right].

We need to construct the subsets {CPX:AC},{CPX:BC}\{ C \in PX : A \subseteq C \}, \{ C \in PX : B \subseteq C \}. In ZFC, we could construct those subsets by applying the comprehension axiom scheme, but the axioms of ETCS have no such blanket axiom scheme. (In fact, as we said earlier, much of the work on “internalizing logic” goes to show that in ETCS, we instead derive a comprehension scheme!) However, one way of defining subsets in ETCS is by taking loci of equations; here, we express the condition ACA \subseteq C, more pedantically A[C]A \subseteq \left[C\right] or χ AC\chi_A \leq C, as the equation

(χ AC)=t X(\chi_A \Rightarrow C) = t_X

where the right side is the predicate “true over XX”.

Thus we construct the subset {CPX:AC}\{C \in PX: A \subseteq C\} of PXPX via the pullback:

{C:AC} 1 t X PX χ A PX \begin{matrix} \{ C : A \le C \} & \longrightarrow & 1 \\ \downarrow & & \downarrow \mathrlap{\scriptsize{t_X}} \\ PX & \underset{\chi_A \Rightarrow -}{\longrightarrow} & PX \end{matrix}

Let me take a moment to examine what this diagram means exactly. Last time we constructed an internal implication operator

:P1×P1P1\Rightarrow : P1 \times P1 \to P1

and now, in the pullback diagram above, what we are implicitly doing is lifting this to an operator

X:PX×PXPX\Rightarrow_X : PX \times PX \to PX

The easy and cheap way of doing this is to remember the isomorphism PXP1 XPX \cong P1^X we used last time to uncover the cartesian closed structure, and apply this to

P1 X×P1 X(P1×P1) X XP1 X\displaystyle P1^X \times P1^X \cong (P1 \times P1)^X \stackrel{\Rightarrow^X}{\to} P1^X

to define X:PX×PXPX\Rightarrow_X : PX \times PX \to PX. This map classifies a certain subset of X×PX×PXX \times PX \times PX, which I’ll just write down (leaving it as an exercise which involves just chasing the relevant definitions):

{(x,T,S)X×PX×PX:x XTx XS}\{ (x, T, S) \in X \times PX \times PX : x \in_X T \Rightarrow x \in_X S \}

: Remark. Similarly we can define a meet operator X:PX×PXPX\wedge_X : PX \times PX \to PX by exponentiating the internal meet P1×P1P1P1 \times P1 \to P1. It is important to know that the general Heyting algebra identities which we established last time for P1P1 lift to the corresponding identities for the operators X, X\wedge_X, \Rightarrow_X on PXPX. Ultimately this rests on the fact that the functor () X(-)^X, being a right adjoint, preserves products, and therefore preserves any algebraic identity which can be expressed as a commutative diagram of operations between such products.

Hence, for the fixed subset AXA \subseteq X (classified by χ A:1PX\chi_A : 1 \to PX), the operator

χ A:PXPX\chi_A \Rightarrow - : PX \to PX

classifies the subset

{(x,S):x XAx XS}X×PX\{ (x, S) : x \in_X A \Rightarrow x \in_X S \} \hookrightarrow X \times PX

Finally, in the pullback diagram above, we are pulling back the operator χ A\chi_A \Rightarrow - against t Xt_X. But, from last time, that was exactly the method we used to construct universal quantification?. That is, given a subset

RX×YR \subseteq X \times Y

we defined YRX\forall_Y R \subseteq X to be the pullback of t Y:1PYt_Y : 1 \hookrightarrow PY along χ R:XPY\chi_R : X \to PY.

Putting all this together, the pullback diagram above expresses the definition

{CPX:AC}:={CPX: xXx XAx XC}\{ C \in PX : A \subseteq C \} := \{ C \in PX : \forall_{x \in X} \: x \in_X A \Rightarrow x \in_X C \}

that one would expect “naively”.

Now that all the relevant constructions are in place, we show that ABA \cup B is the join of AA and BB in the poset Sub(X)Sub(X). There is nothing intrinsically difficult about this, but as we are still in the midst of constructing the internal logic, we will have to hunker down and prove some logic things normally taken for granted or zipped through without much thought. For example, the internal intersection operator was defined with the help of internal universal quantification, and we will need to establish some formal properties of that.

Here is a useful general principle for doing internal logic calculations. Let χ R:XPY\chi_R : X \to PY be the classifying map of a subset RX×YR \subseteq X \times Y, and let f:UXf : U \to X be a function. Then the composite χ Rf:UPY\chi_R f : U \to PY classifies the subset

(f×1 Y) 1(R)U×Y(f \times 1_Y)^{-1}(R) \subseteq U \times Y

so that one has the general identity χ Rf=χ (f×1) 1(R)\chi_R f = \chi_{(f \times 1)^{-1}(R)}. In passing back and forth between the external and internal viewpoints, the general principle is to try to render “complicated” functions UPYU \to PY into a form χ Rf\chi_R f which one can more easily recognize. For lack of a better term, I’ll call this the “pullback principle”.

Lemma 1. Given a relation RX×YR \hookrightarrow X \times Y and a constant c:1Yc : 1 \to Y, there is an inclusion

YR(1 X×c) 1(R)\forall_Y R \subseteq (1_X \times c)^{-1}(R)

as subsets of XX. (In traditional logical syntax, this says that for any element c:1Yc : 1 \to Y,

y:YR(x,y)impliesR(x,c)\forall_{y : Y} R(x, y) \quad implies \quad R(x, c)

as predicates over elements xXx \in X. This is the type of thing that ordinarily “goes without saying”, but which we actually have to prove here!)

Proof. As we recalled above, YRX\forall_Y R \subseteq X was defined to be χ R 1(t Y)\chi_R^{-1}(t_Y), the pullback of global truth t Y:1PYt_Y : 1 \to PY along the classifying map χ R:XPY\chi_R : X \to PY. Hold that thought.

Let

Pc:PYP1Pc : PY \to P1

be the map which classifies the subset {SPY:c YS}\{ S \in PY : c \in_Y S\}. Equivalently, this is the map

P1 c:P1 YP1 1P1^c: P1^Y \to P1^1

under the canonical isomorphisms PYP1 YPY \cong P1^Y, P1 1P1P1^1 \cong P1. Intuitively, this maps ff(c)f \mapsto f(c), i.e., plugs an element cYc \in Y into an element fP1 Yf \in P1^Y.

Using the adjunction (×Y)() Y(- \times Y) \dashv (-)^Y of cartesian closure, the composite

Xχ RPYPcP1\displaystyle X \stackrel{\chi_R}{\to} PY \stackrel{Pc}{\to} P1

transforms to the composite

X×11 X×cX×Yχ RP1X \times 1 \stackrel{1_X \times c}{\to} X \times Y \stackrel{\chi_{R}}{\to} P1

so by the pullback principle, (Pc)χ R:X×1P1(Pc)\chi_R : X \times 1 \to P1 classifies (1 X×c) 1(R)X×1X(1_X \times c)^{-1}(R) \subseteq X \times 1 \cong X.

Equivalently,

(1 X×c) 1(R)=χ R 1(Pc) 1(t)(1)(1_X \times c)^{-1}(R) = \chi_{R}^{-1} (Pc)^{-1}(t) \qquad (1)

Also, as subsets of PYPY, we have the inclusion

(t Y:1PY)(Pc) 1(t)(2)(t_Y : 1 \hookrightarrow PY) \subseteq (Pc)^{-1}(t) \qquad (2)

[this just says that belongs to the subset classified by , or equivalently that is in the subset ]. Applying the pullback operation χ R 1\chi_{R}^{-1} to (2), and comparing to (1), Lemma 1 follows. \Box

Lemma 2. If RSR \subseteq S as subsets of X×YX \times Y, then YR YS\forall_Y R \subseteq \forall_Y S.

Proof. From the last post, we have an adjunction:

T×YSifandonlyifT YST \times Y \subseteq S \quad if and only if \quad T \subseteq \forall_Y S

for any subset of TXT \subseteq X. So it suffices to show YR×YS\forall_Y R \times Y \subseteq S. But

YR×YRS\forall_Y R \times Y \subseteq R \subseteq S

where the first inclusion follows from YR YR\forall_Y R \subseteq \forall_Y R. \Box

Next, recall from the last post that the internal intersection of F:1PPXF : 1 \to PPX was defined by interpreting the following formula on the right:

F= SPX(S PXF)(x XS)\displaystyle \bigcap F = \forall_{S \in PX} (S \in_{PX} F) \Rightarrow (x \in_X S)

Lemma 3. If FG:1PPXF \leq G : 1 \to PPX, then GF:1PX\displaystyle \bigcap G \leq \bigcap F : 1 \to PX.

Proof. F:1PPXF : 1 \to PPX classifies the subset {SPX:S PXF}\{ S \in PX : S \in_{PX} F \}, i.e., FF is identified with the predicate S PXFS \in_{PX} F in the argument SS, so by hypothesis (S PXF)(S PXG)(S \in_{PX} F) \leq (S \in_{PX} G) as predicates on SS. Internal implication aba \Rightarrow b is contravariant in the argument aa [see the following remark], so

[(SG)(xS)][(SF)(xS)]\left[(S \in G) \Rightarrow (x \in S)\right] \leq \left[(S \in F) \Rightarrow (x \in S)\right]
Now apply Lemma 2 to complete the proof. \Box

Remark. The contravariance of b- \Rightarrow b, that is, the fact that xyx \leq y implies (yb)(xb)(y \Rightarrow b) \leq (x \Rightarrow b), is a routine exercise using the adjunction [discussed last time] acba \wedge c \leq b if and only if c(ab).c \leq (a \Rightarrow b).

Indeed, we have

x(yb)y(yb)b(*)x \wedge (y \Rightarrow b) \leq y \wedge (y \Rightarrow b) \leq b \qquad (*)

: where the first inequality follows from the hypothesis xyx \leq y, and the second follows from ybyby \Rightarrow b \leq y \Rightarrow b. By the adjunction, the inequality () implies (yb)(xb)(y \Rightarrow b) \leq (x \Rightarrow b). \Box

Theorem 1. For subsets A,BA, B of XX, the subset ABA \cup B is an upper bound of AA and BB, i.e., A,BABA, B \subseteq A \cup B.

Proof. It suffices to prove that A={CPX:AC}\displaystyle A = \bigcap \{C \in PX : A \subseteq C\}, since then we need only apply Lemma 3 to the trivially true inclusion

{CPX:AC}{CPX:BC}{CPX:AC}\{ C \in PX : A \subseteq C \} \:\cap\: \{ C \in PX : B \subseteq C \} \subseteq \{ C \in PX : A \subseteq C \}

to infer AABA \subseteq A \cup B, and similarly BABB \subseteq A \cup B. (Actually, we need only show A{CPX:AC}\displaystyle A \subseteq \bigcap \{ C \in PX : A \subseteq C \}. We’ll do that first, and then show full equality.)

The condition we want,

A{xX: SPX(AS)(x XS)},A \subseteq \{ x \in X : \forall_{S \in PX} (A \subseteq S) \Rightarrow (x \in_X S) \},

is, by the adjunction (×PX) PX:Sub(X×PX)Sub(X)(- \times PX) \dashv \forall_{PX} : Sub(X \times PX) \to Sub(X), equivalent to

A×PX{(x,S):AS(x XS)}A \times PX \subseteq \{ (x, S) : A \subseteq S \Rightarrow (x \in_X S) \}

which, by a \wedge-\Rightarrow adjunction, is equivalent to

(A×PX)(X×{SPX:AS}) X(1)(A \times PX) \cap (X \times \{ S \in PX : A \subseteq S \}) \subseteq \: \in_X \qquad (1)

as subsets of X×PXX \times PX. So we just have to prove (1). At this point we recall, from our earlier analysis, that

{SPX:AS}={SPX: xXx XAx XS}\{ S \in PX : A \subseteq S \} = \{ S \in PX : \forall_{x \in X} x \in_X A \Rightarrow x \in_X S \}

Using the adjunction (X×) X(X \times -) \dashv \forall_X, as in the proof of Lemma 2, we have

X×{SPX: xXx XAx XS}X \times \{ S \in PX : \forall_{x \in X} x \in_X A \Rightarrow x \in_X S \}
{(x,S):x XAx XS}:=(A×PX) X,\subseteq \{ (x, S) : x \in_X A \Rightarrow x \in_X S \} := (A \times PX) \Rightarrow \in_X,

which shows that the left side of (1) is contained in

(A×PX)((A×PX) X) X,(A \times PX) \cap ((A \times PX) \Rightarrow \in_X) \subseteq \: \in_X,

where the last inclusion uses another \wedge-\Rightarrow adjunction. Thus we have established (1) and therefore also the inclusion

A{SPX:AS}\displaystyle A \subseteq \bigcap \{S \in PX : A \subseteq S\}

Now we prove the opposite inclusion

{SPX:AS}A,\displaystyle \bigcap \{ S \in PX : A \subseteq S \} \subseteq A,

that is to say

{xX: SPXASx XS}A(**)\{ x \in X : \forall_{S \in PX} A \subseteq S \Rightarrow x \in_X S \} \subseteq A \qquad (**)

Here we just use Lemma 1, applied to the particular element χ A:1PX\chi_A : 1 \to PX: we see that the left side of (**)(**) is contained in

{xX:A[χ A]x XA}\{ x \in X : A \subseteq \left[\chi_A\right] \Rightarrow x \in_X A\}

which collapses to {xX:x XA}=A\{ x \in X : x \in_X A \} = A, since A=[χ A]A = \left[\chi_A\right]. This completes the proof. \Box

Theorem 2. ABA \cup B is the least upper bound of A,BA, B, i.e., if CXC \subseteq X is a subset containing both AXA \subseteq X and BXB \subseteq X, then ABCA \cup B \subseteq C.

Proof. We are required to show that

{xX: SPX(ASBS)x XS}C.\{ x \in X : \forall_{S \in PX} \: (A \subseteq S \wedge B \subseteq S) \Rightarrow x \in_X S \} \subseteq C.

Again, we just apply Lemma 1 to the particular element χ C:1PX\chi_C : 1 \to PX: the left-hand side of the claimed inclusion is contained in

{xX:(ACBC)x XC}\{ x \in X : (A \subseteq C \wedge B \subseteq C) \Rightarrow x \in_X C \}

but since (ACBC)(A \subseteq C \wedge B \subseteq C) is true by hypothesis (is globally true as a predicate on the implicit variable xXx \in X), this last subset collapses to

{xX:t Xx XC}={xX:x XC}=C\{ x \in X : t_X \Rightarrow x \in_X C \} = \{ x \in X : x \in_X C \} = C

which completes the proof. \Box

Theorems 1 and 2 show that for any set XX, the external poset Sub(X)Sub(X) admits joins. One may go on to show (just on the basis of the topos axioms) that as in the case of meets, the global external operation of taking joins is natural in XX, so that by the Yoneda principle, it is classified by an internal join operation

:P1×P1P1,\vee : P1 \times P1 \to P1,

namely, the map which classifies the union of the subsets

[π 1]=P1×11×tP1×P1\displaystyle \left[\pi_1\right] = P1 \times 1 \stackrel{1 \times t}{\hookrightarrow} P1 \times P1
[π 2]=1×P1t×1P1×P1\displaystyle \left[\pi_2\right] = 1 \times P1 \stackrel{t \times 1}{\hookrightarrow} P1 \times P1

and this operation satisfies all the expected identities. In short, P1P1 carries an internal Heyting algebra structure, as does PXP1 XPX \cong P1^X for any set XX.

We will come back to this point later, when we show (as a consequence of strong extensionality) that P1P1 is actually an internal Boolean algebra.

Construction of Coproducts

Next, we construct coproducts just as we do in ordinary set theory: as disjoint unions. Letting X,YX, Y be sets (objects in an ETCS category), a disjoint union of XX and YY is a pair of monos

i:XZj:YZi : X \hookrightarrow Z \qquad j : Y \hookrightarrow Z

whose intersection is empty, and whose union or join in Sub(Z)Sub(Z) is all of ZZ. We will show that disjoint unions exist and are essentially unique, and that they satisfy the universal property for coproducts. We will use the notation X+YX + Y for a disjoint union.

Theorem 3. A disjoint union of XX and YY exists.

Proof. It’s enough to embed X,YX, Y disjointly into some set CC, since the union of the two monos in Sub(C)Sub(C) would then be the requisite ZZ. The idea now is that if a disjoint union or coproduct X+YX+Y exists, then there’s a canonical isomorphism P(X+Y)PX×PYP(X + Y) \cong PX \times PY. Since the singleton map

σ:X+YP(X+Y)PX×PY\sigma : X + Y \to P(X + Y) \cong PX \times PY

is monic, one thus expects to be able to embed XX and YY disjointly into PX×PYPX \times PY. Since we can easily work out how all this goes in ordinary naive set theory, we just write out the formulas and hope it works out in ETCS.

In detail, define i X:XPX×PYi_X : X \to PX \times PY to be

XX×1σ X×χ 0PX×PY\displaystyle X \cong X \times 1 \stackrel{\sigma_X \times \chi_0}{\to} PX \times PY

where σ X\sigma_X is the singleton mapping and χ 0\chi_0 classifies 0Y0 \hookrightarrow Y; similarly, define i Y:YPX×PYi_Y :Y \to PX \times PY to be

Y1×Yχ 0×σ YPX×PY.\displaystyle Y \cong 1 \times Y \stackrel{\chi_0 \times \sigma_Y}{\to} PX \times PY.

Clearly i Xi_X and i Yi_Y are monic, so to show disjointness we just have to show that their pullback is empty. But their pullback is isomorphic to the cartesian product of the pullbacks of the diagrams

Xσ XPXχ 011χ 0PYσ YY\displaystyle X \stackrel{\sigma_X}{\to} PX \stackrel{\chi_0}{\leftarrow} 1 \qquad 1 \stackrel{\chi_0}{\to} PY \stackrel{\sigma_Y}{\leftarrow} Y

so it would be enough to show that each (or just one) of these two pullbacks is empty, let’s say the first.

Suppose given a map h:AXh: A \to X which makes the square

A 1 h χ 0 X σ X PX\array{ \arrayopts{\rowalign{center axis center}} \A & \longrightarrow & 1 \\ \mathllap{\scriptsize{h}} \downarrow & & \downarrow \mathrlap{\scriptsize{\chi_0}} \\ X & \underset{\dlap{\sigma_X}}{\longrightarrow} & PX }

commute. Using the pullback principle, the map A1χ 0PXA \to 1 \stackrel{\chi_0}{\to} PX classifies

0×AX×A0 \times A \hookrightarrow X \times A

which is just the empty subset. This must be the same subset as classified by σ Xh=χ δh\sigma_X h = \chi_{\delta} h (where δ:XX×X\delta : X \hookrightarrow X \times X is the diagonal), which by the pullback principle is

(1 X×h) 1(δ)X×A.(1_X \times h)^{-1}(\delta) \hookrightarrow X \times A.

An elementary calculation shows this to be the equalizer of the pair of maps

π 1,hπ 2:X×AX\pi_1, h\pi_2 : X \times A \stackrel{\to}{\to} X

So this equalizer EE is empty. But notice that h,1:AX×A\langle h, 1 \rangle : A \to X \times A equalizes this pair of maps. Therefore we have a map AE0A \to E \cong 0. By Corollary 2 above, we infer A0A \cong 0. This applies to the case where AA is the pullback, so the pullback is empty, as was to be shown. \Box

Theorem 4. Any two disjoint unions of X,YX, Y are canonically isomorphic.

Proof. Suppose i:XZY:ji : X \rightarrow Z \leftarrow Y : j is a disjoint union. Define a map

ϕ=ϕ 1,ϕ 2:ZPX×PY\phi = \langle \phi_1, \phi_2 \rangle : Z \to PX \times PY

where ϕ 1:ZPX\phi_1 : Z \to PX classifies the subset 1 X,i:XX×Z\langle 1_X, i \rangle : X \to X \times Z, and ϕ 2:ZPY\phi_2 : Z \to PY classifies the subset 1 Y,j:YY×Z\langle 1_Y, j \rangle : Y \to Y \times Z. Applying the pullback principle, the composite ϕ 1i:XPX\phi_1 i : X \to PX classifies

(1 X×i) 1(1 X,i)X×X(1_X \times i)^{-1}(\langle 1_X, i \rangle) \hookrightarrow X \times X

which is easily seen to be the diagonal on XX. Hence ϕ 1i=σ X\phi_1 i = \sigma_X. On the other hand, ϕ 1j:YPX\phi_1 j : Y \to PX classifies the subset

(1 X×j) 1(1 X,i)X×Y(1_X \times j)^{-1}(\langle 1_X, i \rangle) \hookrightarrow X \times Y

which is empty because ii and jj are disjoint embeddings, so ϕ 1j=χ 0:YPX\phi_1 j = \chi_0 : Y \to PX. Similar calculations yield

ϕ 2i=χ 0:XPYϕ 2j=σ Y:YPY.\phi_2 i = \chi_0 : X \to PY \qquad \phi_2 j = \sigma_Y : Y \to PY.

Putting all this together, we conclude that ϕi=i X:XPX×PY\phi i = i_X : X \to PX \times PY and ϕj=i Y:YPX×PY\phi j = i_Y : Y \to PX \times PY, where i Xi_X and i Yi_Y were defined in the proof of Theorem 3.

Next, we show that ϕ\phi is monic. If not, then by strong extensionality, there exist distinct elements c,d:1Zc, d : 1 \to Z for which ϕ(c)=ϕ(d)\phi(c) = \phi(d); therefore, ϕ 1c=ϕ 1d:1PX\phi_1 c = \phi_1 d : 1 \to PX and ϕ 2c=ϕ 2d:1PY\phi_2 c = \phi_2 d : 1 \to PY. By the pullback principle, these equations say (respectively)

c 1(i)=d 1(i)1c 1(j)=d 1(j)1 c^{-1}(i) = d^{-1}(i) \hookrightarrow 1 \qquad c^{-1}(j) = d^{-1}(j) \hookrightarrow 1

If c 1(i)=1c^{-1}(i) = 1, then both c,d:1Zc, d : 1 \to Z factor through the mono i:XZi : X \to Z. However, since ϕi=i X\phi i = i_X is monic, this would imply that ϕ(c)ϕ(d)\phi (c) \neq \phi (d), contradiction. Therefore c 1(i)=ci=0c^{-1}(i) = c \cap i = 0. By similar reasoning, cj=0c \cap j = 0. Therefore

i¬cj¬ci \subseteq \neg c \qquad j \subseteq \neg c

where ¬=(0):Sub(Z)Sub(Z)\neg = (- \Rightarrow 0) : Sub(Z) \to Sub(Z) is the negation operator. But then ij¬ci \cup j \subseteq \neg c. And since 1:ZZ1 : Z \hookrightarrow Z is the union iji \cup j by assumption, ¬c\neg c must be the top element Sub(Z)\top \in Sub(Z), whence c:1Zc : 1 \to Z is the bottom element 0. This contradicts the assumption that the topos is nondegenerate. Thus we have shown that ϕ\phi must be monic.

The argument above shows that ϕ:ZPX×PY\phi : Z \hookrightarrow PX \times PY is an upper bound of i X:XPX×PYi_X : X \to PX \times PY and i Y:YPX×PYi_Y : Y \to PX \times PY in Sub(PX×PY)Sub(PX \times PY). It follows that the join X+YX + Y constructed in Theorem 3 is contained in ϕ:ZPX×PY\phi : Z \to PX \times PY, and hence can be regarded as the join of XX and YY in Sub(Z)Sub(Z). But ZZ is their join in Sub(Z)Sub(Z) by assumption of being a disjoint union, so the containment X+YZX + Y \subseteq Z must be an equality. The proof is now complete. \Box

Theorem 5. The inclusions i X:XX+Yi_X : X \to X + Y, i Y:YX+Yi_Y : Y \to X + Y exhibit X+YX + Y as the coproduct of XX and YY.

Proof. Let f:XBf : X \to B, g:YBg : Y \to B be given functions. Then we have monos

1 X,f:XX×B1 Y,g:YY×B(1)\langle 1_X, f \rangle : X \hookrightarrow X \times B \qquad \langle 1_Y, g \rangle : Y \hookrightarrow Y \times B \qquad (1)

Now the operation ×B:Sub(C)Sub(C×B)- \times B : Sub(C) \to Sub(C \times B) certainly preserves finite meets, and also preserves finite joins because it is left adjoint to B:Sub(C×B)Sub(C)\forall_B : Sub(C \times B) \to Sub(C). Therefore this operation preserves disjoint unions; we infer that the monos

X×Bi X×B(X+Y)×BY×Bi Y×B(X+Y)×B(2)X \times B \stackrel{i_X \times B}{\to} (X + Y) \times B \qquad Y \times B \stackrel{i_Y \times B}{\to} (X + Y) \times B \qquad (2)

exhibit (X+Y)×B(X + Y) \times B as a disjoint union of X×B,Y×BX \times B, Y \times B. Composing the monos of (1) and (2), we have disjoint embeddings of XX and YY in (X+Y)×B(X + Y) \times B. Using Theorem 4, X+YX + Y is isomorphic to the join of these embeddings; this means we have an inclusion

1,h:X+Y(X+Y)×B\langle 1, h \rangle : X + Y \hookrightarrow (X + Y) \times B

whose restriction to XX yields i X,f\langle i_X, f \rangle and whose restriction to YY yields i Y,g\langle i_Y, g \rangle. Hence h:X+YBh : X + Y \to B extends ff and gg. It is the unique extension, for if there were two extensions h,hh, h', then their equalizer would be an upper bound of X,YX, Y in Sub((X+Y)×B)Sub((X + Y) \times B), contradicting the fact that X+YX + Y is the least upper bound. This completes the proof. \Box

I think that’s enough for one day. I will continue to explore the categorical structure and logic of ETCS next time.

Created on September 29, 2014 at 04:43:56. See the history of this page for a list of all contributions to it.