Contents

### Context

#### Measure and probability theory

measure theory

probability theory

# Contents

## Idea

The Giry monad (Giry 80, following Lawvere 62) is the monad on a category of suitable spaces which sends each suitable space $X$ to the space of suitable probability measures on $X$.

It is one of the main examples of a probability monad, and hence one of the main structures used in categorical probability.

## Definition

The Giry monad is defined on the category of measurable spaces, assigning to each measurable space $X$ the space $G(X)$ of all probability measures on $X$ endowed with the $\sigma$-algebra generated by the set of all the evaluation maps

$ev_U \colon G(X) \to [0,1]$

sending a probability measure $P$ to $P(U)$, where $U$ ranges over all the measurable sets of $X$. The unit of the monad sends a point $x \in X$ to the Dirac measure at $x$, $\delta_x$, while the monad-multiplication is defined by the natural transformation

$\mu_{X} \;\colon\; G\big(G(X)\big) \longrightarrow G(X)$

given by

$\mu_X (Q)(U) \;\coloneqq\; \textstyle{\int}_{q \in G(X)} ev_U(q) \,dQ \,.$

This makes the endofunctor $G$ into a monad, and as such this is the Giry monad on measurable spaces, as originally defined by Lawvere 1962.

An alternative choice, convenient for analysis purposes, and introduced by Giry, is obtained by restricting the category of measurable spaces to the (full) subcategory which are those measurable spaces generated by Polish spaces, $Pol$, which are separable metric spaces for which a complete metric exists.

Write

$P \colon Pol \to Pol$

for the endofunctor which sends a space, $X$, to the space of probability measures on the Borel subsets of $X$. $P(X)$ is equipped with the weakest topology which makes the integration map $\tau \mapsto \int_{X}f d\tau$ continuous for any $f$, a bounded, continuous, real function on $X$.

There is a natural transformation

$\mu_{X}: P(P(X)) \to P(X)$

given by

$\mu_X (M)(A) := \int_{P(X)} \tau(A) M(d\tau).$

This makes the endofunctor $P$ into a monad, and this is the Giry monad on Polish spaces. The shortcoming of restricting the Giry monad to Polish spaces is that discrete spaces have no $P$-algebras (even though they do have $G$-algebras). For example, take the discrete (measurable) space $2$, which we can also view as a topological discrete space. The measurable map $\epsilon_{2}: G(2) \to 2$ defined by $\epsilon_{2}( (1-\alpha) \delta_0 + \alpha \delta_1) = 0$ for all $\alpha \in [0,1)$, and by $\epsilon_{2}(\delta_1)=1$ shows that $(2,\epsilon_2)$ is a $G$-algebra, yet it is not a $P$-algebra. (As Doberkat has noted, a $P$-algebra must be connected, but that is impossible for discrete spaces.) The map $\epsilon_{2}$ plays a critical role in the analysis of the $G$-algebras, because it is the unique affine map from the geometric (convex) space $G(2)$, which is isomorphic to the unit interval $[0,1]$ with its natural convex structure, to the discrete (combinatorial) convex space $2$.

## Properties

### Kleisli category

The Kleisli morphisms of the Giry monad on Meas (and related subcategories) are Markov kernels. Therefore its Kleisli category is the category Stoch. It is one of the most important examples of a Markov category.

### Algebras over the Giry monad

(Doberkat 03) works out the algebras for the Giry monad using the Giry monad defined on $Pol$. We want measurable maps between $P(X)$ and $X$, such that the ‘fibres’ are convex and closed, and such that $\delta_{x}$, the delta distribution on $x$, is in the fibre over $x$. And there’s another condition which requires a compact subset of $P(X)$ to be sent to a compact subset of $X$.

Now, as ever, $P(X)$ will support an algebra, $\mu_{X}: P(P(X)) \to P(X)$. This is the analogue of a free group being an algebra of the group monad. But just as there are many interesting groups which are not free, we should want to find algebras of Giry’s monad which are not of the $\mu_{X}$ form. Doberkat shows that for such an algebra $X$ must be connected, and suggests this example

$h: P([0, 1]) \ni \tau \mapsto \int_{0}^{1} t d\tau{(t)} \in [0, 1].$

which is the free algebrea $\mu_{\{0, 1\}}$ using the fact that the probability measures on $\{0, 1\}$ are just binomial, parameterised by $t \in [0, 1]$, i.e., every $p \in P(\{0,1\})$ is represented by $p=(1-t) \delta_0 + t \delta_1$. The other example he gives has $X$ a bounded, closed and convex subset of $\mathbb{R}^n$, and probability measures being sent to their barycentre. (This is an example of the Giry (probability) monad for compact Hausdorff spaces which was shown by Swirszcz (Monadic functors and convexity (1974).)

Doberkat has a longer article on Eilenberg-Moore algebras of the Giry monad as item 5 here. (Unfortunately, the monograph ‘Stochastic Relations: Foundations for Markov Transition Systems’ doesn’t appear to be available.) There are two monads being treated here, one which sends a Polish space to the space of all probability measures, the other to the space of all subprobability measures. The extra structure relating to these monads, is that of a (positive) convex structure. In the case of a convex structure, this intuitively captures the idea that a weighted sum of points in the space has barycentre within the space.

The fundamental question concerning algebras is the existence of algebras which are not free. The Eilenberg-Moore category of algebras, for any monad, is descriptive in nature but not constructive, and says nothing about the existence of algebras which are not free. For this reason it is necessary to find a different representation of the category of $G$-algebras. (Recall that the full subcategory of the Eilenberg-Moore category whose objects are the free algebras is isomorphic to the Kleisi category.)

(VanBelle 21) shows, in effect, that the restricted Giry monad (viewed as just a functor) $G|: \mathbf{Meas}_c \rightarrow \mathbf{Meas}$ has the codensity monad $G$, where $\mathbf{Meas}_c$ is the category of countable measurable spaces with the discrete $\sigma$-algebra (hence, in effect, one can view these simply as sets). This suggest that the natural numbers $\mathbb{N}$ are sufficient in some sense. To make this more precise, we first make the observation that given any $G$-algebra $h:G(X) \rightarrow X$ on a (coseparated) measurable space $X$, the underlying set $X$ has the structure of a superconvex space, denoted $X_h$, defined by $\sum_{i=1}^{\infty} p_i x_i := h( \sum_{i=1}^{\infty} p_i \delta_{x_i})$, where $\sum_{i=1}^{\infty} p_i = 1$ and each $p_i \ge 0$. The proof is

$\begin{array}{rcl} (h \mu_X)(\sum_{i \in \mathbb{N}} p_i \delta_{P_i}) &=& (h \circ \G(h))(\sum_{i \in \mathbb{N}} p_i \delta_{P_i}) \\ h(\sum_{i \in \mathbb{N}} p_i P_i) &=& h(\sum_{i \in \mathbb{N}} p_i \delta_{h(P_i)}) \\ h(\sum_{i \in \mathbb{N}} p_i P_i) &=& \sum_{i \in \mathbb{N}} p_i h(P_i) \end{array}.$

Moreover, given any map of $G$-algebras, $f: (X,h) \rightarrow (Y,k)$ it follows that $f$ is a countably affine map because

$\begin{array}{lcl} f( \sum_{i \in \mathbb{N}} p_i x_i) &=& f(h( \sum_{i \in \mathbb{N}} p_i \delta_{x_i} )) \\ &=& k\big( G{(f)}( \sum_{i \in \mathbb{N}} p_i \delta_{x_i} ) \big) \\ &=& k\big( \sum_{i \in \mathbb{N}} p_i \delta_{f(x_i)} \big) \\ &=& \sum_{i \in \mathbb{N}} p_i k(\delta_{f(x_i))} \\ &=& \sum_{i \in \mathbb{N}} p_i f(x_i) \end{array}$

For this reason we are naturally led to the category of super convex spaces, denoted $\mathbf{SCvx}$, which have been referred to as strongly convex spaces in the quantum mechanics literature, e.g., See Mackey - Math. Found. Q.M.. The morphisms of $\mathbf{SCvx}$ are countably affine map (barycenter maps) which generalizes (and is consistent with) the elementary examples of probability monads on compact Hausdorff spaces, etc.. These $G$-algebras, with measurable maps, provide the connection between the continuous (geometric) spaces and discrete (combinatorial) spaces which are necessary for modeling non-determinism.

Now we can describe the two ways in which $\mathbb{N}$ is ‘’sufficient’’. The full subcategory of super convex spaces consisting of the single object $G(\mathbb{N})$ is dense in $\mathbf{SCvx}$. (This is almost immediate from the definition because if $A$ is any super convex space then a countably affine map $f: G(\mathbb{N}) \rightarrow A$ is completely defined by where $f$ maps each $\delta_i \in G(\mathbb{N})$.) On the other hand, there exists a codense functor from the full subcategory of $\mathbf{SCvx}$ consisting of the single object $\mathbb{N}$ to the category of separated standard measurable spaces. These two constructions are the content of the article (Sturtz 22) which attempts to show the category of $G$-algebras on the subcategory (of $\mathbf{Meas}$) given by the category of standard measurable spaces is isomorphic to the category of super convex space.

### Voevodsky’s work

Vladimir Voevodsky has also worked on a category theoretic treatment of probability theory, and gave few talks on this at IHES, Miami, in Moscow etc. Voevodsky had in mind applications in mathematical biology?, for example, population genetics:

…a categorical study of probability theory where “categorical” is understood in the sense of category theory. Originally, I developed this approach to probability to get a better understanding of the constructions which I had to deal with in population genetics. Later it evolved into something which seems to be also interesting from a purely mathematical point of view. On the elementary level it gives a category which is useful for the work with probabilistic constructions involving complicated combinations of stochastic processes of different types. On a more advanced level, applying in this context the old idea of a functor as a generalized object one gets a better view of the relationship between probability and the theory of (pre-)ordered topological vector spaces.

A talk in Moscow (20 Niv 2008, in Russian) can be viewed here, wmv 223.6 Mb. Abstract:

In early 60-ies Bill Lawvere defined a category whose objects are measurable spaces and morphisms are Markov kernels. I will try to show how this category allows one to think about many of the notions of probability theory in categorical terms and to connect probabilistic objects to objects of other types through various functors.

Voevodsky’s unfinished notes on categorical probability theory have been released posthumously.

Prakash Panangaden in Probabilistic Relations defines the category $SRel$ (stochastic relations) to have as objects sets equipped with a $\sigma$-field. Morphisms are conditional probability densities or stochastic kernels. So, a morphism from $( X, \Sigma_X)$ to $( Y, \Sigma_Y)$ is a function $h: X \times \Sigma_Y \to [0, 1]$ such that

1. $\forall B \in \Sigma_Y . \lambda x \in X . h(x, B)$ is a bounded measurable function,
2. $\forall x \in X . \lambda B \in \Sigma_Y . h(x, B)$ is a subprobability measure on $\Sigma_Y$.

If $k$ is a morphism from $Y$ to $Z$, then $k \cdot h$ from $X$ to $Z$ is defined as $(k \cdot h)(x, C) = \int_Y k(y, C)h(x, d y)$.

Panangaden’s definition differs from Giry’s in the second clause where subprobability measures are allowed, rather than ordinary probability measures.

Panangaden emphasises that the mechanism is similar to the way that the category of relations can be constructed from the power set functor. Just as the category of relations is the Kleisli category of the powerset functor over the category of sets Set, $SRel$ is the Kleisli category of the functor over the category of measurable spaces and measurable functions which sends a measurable space, $X$, to the measurable space of subprobability measures on $X$. This functor gives rise to a monad.

What is gained by the move from probability measures to subprobability measures? One motivation seems to be to model probabilistic processes from $X$ to a coproduct $X + Y$. This you can iterate to form a process which looks to see where in $Y$ you eventually end up. This relates to $SRel$ being traced.

There is a monad on $MeasureSpaces$, $1 + -: Meas \to Meas$. A probability measure on $1 + X$ is a subprobability measure on $X$. Panangaden’s monad is a composite of Giry’s and $1 + -$.

## History

The adjunction underlying the Giry monad was originally developed by Lawvere in 1962, prior to the full recognition of the relationship between monads and adjunctions. Although P. Huber had already shown in 1961 that every adjoint pair gives rise to a monad, it wasn’t until 1965 that the constructions of Eilenberg-Moore, and Kleisli, made the essential equivalence of both concepts manifest.

Lawvere’s construction was written up as an appendix to a proposal to the Arms Control and Disarmament Agency, set up by President Kennedy as part of the State Department to handle planning and execution of certain treaties with the Soviet Union. This appendix was intended to provide a reasonable framework for arms control verification protocols (Lawvere 20).

At that time, Lawvere was working for a “think tank” in California, and the purpose of the proposal was to provide a means for verifying compliance with limitations on nuclear weapons. In the 1980’s, Michèle Giry was collaborating with another French mathematician at that time who was also working with the French intelligence agency, and was able to obtain a copy of the appendix. Giry then developed and extended some of the ideas in the appendix (Giry 80)

Gian-Carlo Rota had also (somehow) obtained a copy of the appendix, which ended up in the library at The American Institute of Mathematics, and only became publicly available in 2012.

From Lawvere 20:

I’d like to say that the idea of the category of probabilistic mappings, the document corresponding to that was not part of a seminar, as some of the circulations say, essentially it was the document submitted to the arms control and disarmament agency after suitable checking that the Pentagon didn’t disagree with it. Because of the fact that for arms control agencies as a side responsibility the forming of arms control agreements and part of these agreements must involve agreed upon protocols of verification. So the idea of that paper did not provide such protocols, but it purported to provide reasonable framework within which such protocol can be formulated.

## References

The idea originates with

and was picked up and published in:

• Michèle Giry, A categorical approach to probability theory, Categorical aspects of topology and analysis (Ottawa, Ont., 1980), pp. 68–85, Lecture Notes in Math. 915 Springer 1982 (doi:10.1007/BFb0092872)

(there are allegedly a few minor analytically incorrect points and gaps in proofs, observed by later authors).

According to E. Burroni (2009), the Giry monad appears also in

• O. de la Tullaye, L’intégration considérée comme l’algèbre d’un triple. Rapport de Stage de D.E.A. manuscrit 1971.

The article

• Ruben Van Belle, Probability monads as codensity monads (2021) $[$arXiv:2111.01250$]$

shows, in effect, that the Giry monad restricted to countable measurable spaces (with the discrete $\sigma$-algebra) yields the restricted Giry functor $G|: \mathbf{Meas}_c \rightarrow \mathbf{Meas}$ which has the codensity monad $G$. This suggest that the natural numbers $\mathbb{N}$ are ‘’sufficient’‘ in the sense characterized above which suggest a factorization of the Giry monad, defined on the category of standard measurable spaces, through the category of super convex spaces, spelled out in

• Kirk Sturtz, Giry algebras for standard measurable spaces $[$arXiv:2202:10819$]$

which has evolved from the preliminary work

• Kirk Sturtz, Categorical Probability Theory, $[$arXiv:1406.6030$]$

which views probability measures via double dualization, restricted to weakly averaging affine maps which preserves limits. (A more satisfactory description of probability measures arises from recognizing the need for viewing them as weakly-averaging countably affine maps, rather than just finite affine maps, as discussed in that article.)

Some corrections from an earlier version of the article, were pointed out in

Apart from these papers, there are similar developments in

• Ernst-Erich Doberkat, Characterizing the Eilenberg-Moore algebras for a monad of stochastic relations (pdf)

• Ernst-Erich Doberkat, Kleisli morphisms and randomized congruences, Journal of Pure and Applied Algebra Volume 211, Issue 3, December 2007, Pages 638-664 https://doi.org/10.1016/j.jpaa.2007.03.003

• N. N. Cencov, Statistical decisions rules and optimal Inference, Translations of Math. Monographs 53, Amer. Math. Society 1982

(blog comment) Cencov’s “category of statistical decisions” coincides with Giry’s (Lawvere’s) category. I ($\leftarrow$ somebody) have the sense that Cencov discovered this category independently of Lawvere although years later.

• category cafe related to Giry monad: category theoretic probability, coalgebraic modal logic

• Samson Abramsky et al. Nuclear and trace ideals in tensored ∗-Categories,arxiv:math/9805102, on the representation of probability theory through monads, which looks to work Giry’s monad into a context even more closely resembling the category of relations.

There is also relation with work of Jacobs et al.

J. Culbertson and K. Sturtz use the Giry monad in their categorical approach to Bayesian reasoning and inference (both articles contain further references to the categorical approach to probability theory):

• Jared Culbertson and Kirk Sturtz, A categorical foundation for Bayesian probability, Applied Cat. Struc. 2013 (preprint as arXiv:1205.1488)

• Jared Culbertson and Kirk Sturtz, Bayesian machine learning via category theory, 2013 (arxiv:1312.1445)

• Elisabeth Burroni, Lois distributives. Applications aux automates stochastiques, TAC 22, 2009 pp.199-221 (journal page)

where she derives stochastic automata as algebras for a suitable distributive law on the monoid and Giry monads.

B. Fong has a section on the Giry monad in his paper on Bayesian networks:

• Fong: Causal Theories - A Categorical Perspective on Bayesian Networks, (2013) arXiv:1301.6201