In many categorical approaches to probability theory, one considers a category of spaces, such as measurable spaces or topological spaces, and equips this category with a monad whose functor part assigns to each space $X$ a space $P X$ of measures, probability measures, or valuations on $X$, or a variation thereof.
For probability theory, this can be interpreted as adding to the points of a space $X$ new “random points”, modelled as probability measures or valuations. The old points, which we can think of as deterministic, are embedded in $P X$ via the unit of the monad $X\to P X$. Just as well, the Kleisli morphisms of $P$ can be seen as stochastic maps. (Monads can be seen as ways of extending our spaces and functions to account for new phenomena, see for example extension system and monad in computer science.) Note that these probability measures are technically different from random elements: they rather correspond to the laws of the random elements.
Algebras of probability and measure monads can be interpreted as generalized convex spaces or conical spaces of a certain kind. For probability theory, in particular, the algebras of a probability monad can be seen as spaces equipped with a notion of expectation value of a random variable.
The details vary depending on the monad and on the category under consideration.
Many choices of categories and of monads are possible, depending on which aspects of measure theory or probability one wants to study. See the table below for more details.
Note that “probability monad” and “measure monad” are not used here as technical terms with a definition. Here we just explain what most monads of this kind look like. (There are ongoing works which may give a general, precise definition of probability monad.) The term “probability monad” was coined by Giry (see here), to refer to what we today call the Giry monad.
The basic idea, that holds for all monads of this sort, is that of forming spaces of random elements, or rather their laws.
Consider a category $C$, whose objects we can think of as “spaces of outcomes” with possibly extra structure, such as measurable spaces, or topological spaces, or even just sets. A probability monad assigns to each space $X$ a space $P X$ which we can think of as a space of “laws of random outcomes on $X$” or other measure-like entities. For example,
(See also the table below.)
A possible interpretation of probability monads, which uses the interpretation of monads as encoding generalized theories, is that $P X$ is a space of “formal generalized convex combinations” (in the sense of convex space) of points of $X$. For instance, for the distribution monad, given the set $\{heads,tails\}$ of possible outcomes of a coin flip, the set $P X$ contains for example the distribution that assigns probability $1/2$ to both outcomes. This can be thought of as the formal convex combination
On the morphisms, the functor gives for example the pushforward of measures.
The unit of the monad gives, for each space of outcomes, a map $X\to P X$, which we can think of as picking out “those outcomes which are not really random”, or “Dirac delta” (see Dirac measure and Dirac valuation.
These can be seen as the laws of a deterministic random variable.
The multiplication of the monad is a map $P P X\to P X$ for all objects $X$. This can be thought of as of “mixing” or “averaging” probability measures.
The following example is taken from Perrone ‘19, Example 5.1.2. Suppose that you have two coins in your pocket. Suppose that one coin is fair, with “heads” on one face and “tails” on the other face; suppose the second coin has “heads” on both sides. Suppose now that you draw a coin randomly, and flip it. We can sketch the probabilities in the following way: Let $X$ be the set $\{heads,tails\}$. A coin gives a law according to which we will obtain “heads” or “tails” so it determines an element of $P X$. Since the choice of coin is also random (we also have a law on the coins), the law on the coins determines an element of $P P X$. By averaging, the resulting overall probabilities are
In other words, the “average” or “mixture” can be thought of as an assignment $E:P P X\to P X$, from laws of “random random variables” to laws of ordinary random variables.
According to the interpretation of probability monads in terms of “formal generalized convex combinations”, the algebras of a probability monad are then spaces equipped with a way of evaluating these expressions to a result, which we can see as a generalized “midpoint?”, “average”, or “expectation value”. (Expectation values play a very important role in probability: probability monads can encode them via their algebras.)
In other words, algebras of probability monads can be thought of as generalizations of convex sets, depending on the actual category and monad in question. For example,
(See also the table below.)
If the measure are not required to be normalized, instead of (generalized) convex combinations one should think of linear combinations with non-negative coefficients, and so as algebra one gets a generalization of conical spaces instead.
The basic interpretation of a Kleisli morphism, i.e. a map $X\to P Y$, is that of a map with a random outcome. For example,
The Kleisli composition recovers exactly the Chapman-Kolmogorov formula? for the composition of stochastic maps. In the language of conditional probability, this reads as
for the discrete case, and as
for the continuous case.
(See Giry’s original article for the details, as well as the introductions given in Perrone ‘18, Section 1.1 and Perrone ‘19, Example 5.1.13).
Kleisli categories of probability monads are often instances of Markov categories.
Probability monads are usually defined on monoidal categories, in particular on cartesian monoidal categories. On product spaces $X\times Y$, the object $P(X\times Y)$ has the interpretation of containing joint distributions. Given a joint distribution, one can form the marginals by pushing forward along the product projections $X\times Y\to X$ and $X\times Y\to Y$. This is in general a destructive operation, since stochastic dependence may be discarded.
Given marginal distributions, one can form a canonical joint distribution by forming the product distribution. This is encoded by a monoidal structure on the probability monad (or equivalently, a commutative strength), a map which satisfies particular compatibility conditions. The Giry monad is monoidal on the category Meas with the cartesian product. Similar statements are true for most other probability and measure monads.
When the monad is affine, one has that a product probability is necessarily the product of its marginals. This is the case for monads of probability measures, but not for monads of unnormalized measures.
More on this at joint and marginal probability.
(…work in progress…)
(…to be expanded…)
W. Lawvere, The category of probabilistic mappings, ms. 12 pages, 1962
Michèle Giry, A categorical approach to probability theory, Categorical aspects of topology and analysis (Ottawa, Ont., 1980), pp. 68–85, Lecture Notes in Math. 915 Springer 1982.
T. Swirszcz, Monadic functors and convexity, Bulletin de l’Academie Polonais des Sciences 22, 1974 (pdf)
Klaus Keimel, The monad of probability measures over compact ordered spaces and its Eilenberg-Moore algebras, Topology and its Applications, 2008 (doi:10.1016/j.topol.2008.07.002)
Reinhold Heckmann, Spaces of valuations, Papers on General Topology and Ap-plications, 1996 (doi:10.1111/j.1749-6632.1996.tb49168.x,pdf)
Mauricio Alvarez-Manilla, Achim Jung, Klaus Keimel, The probabilistic powerdomain for stably compact spaces, Theoretical Computer Science 328, 2004. Link here.
C. Jones and Gordon. D. Plotkin?, A probabilistic powerdomain of evaluations, LICS 4, 1989. (doi:10.1109/LICS.1989.39173)
Jean Goubault-Larrecq and Xiaodong Jia, Algebras of the extended probabilistic powerdomain monad, ENTCS 345, 2019
Tobias Fritz, Paolo Perrone and Sharwin Rezagholi, Probability, valuations, hyperspace: Three monads on Top and the support as a morphism, Mathematical Constructions in Computer Science 31(8), 2021. arXiv.
Steve Vickers, A monad of valuation locales, 2011. Link here.
Franck van Breugel, The metric monad for probabilistic nondeterminism, unpublished, 2005. (pdf)
Tobias Fritz and Paolo Perrone, A probability monad as the colimit of spaces of finite samples, Theory and Applications of Categories 34, 2019. (pdf)
Tobias Fritz and Paolo Perrone, Stochastic order on metric spaces and the ordered Kantorovich monad, Advances in Mathematics 366, 2020. (arXiv:1808.09898)
Bart Jacobs, From probability monads to commutative effectuses, Journal of Logical and Algebraic Methods in Programming 94, 2018.
Tobias Fritz, Convex spaces I: definitions and examples, 2009. (arXiv:0903.5522)
Chris Heunen, Ohad Kammar, Sam Staton and Hongseok Yang, A convenient category for higher-order probability theory, Proceedings of LICS 2017. (arXiv)
Sean Moss?, Paolo Perrone, Probability monads with submonads of deterministic states, LICS 2022. (arXiv:2204.07003)
Tobias Fritz, Tomáš Gonda, Paolo Perrone, Eigil Fjeldgren Rischel, Representable Markov categories and comparison of statistical experiments in categorical probability, Theoretical Computer Science 961, 2023. (arXiv:2010.07416)
Peter Kristel, Benedikt Peterseim, A topologically enriched probability monad on the cartesian closed category of CGWH spaces. (arXiv)
An account in terms of codensity monads in:
An introduction to some of the concepts can be also found in:
Paolo Perrone, Notes on Category Theory with examples from basic mathematics, Chapter 5. (arXiv)
Paolo Perrone, Categorical probability and stochastic dominance in metric spaces, PhD thesis, Chapter 1. (pdf)
Tobias Fritz and Paolo Perrone, Monads, partial evaluations, and rewriting, MFPS 36, 2020 - Section 6. (arXiv)
and in the following video lectures:
Arthur Parzygnat, Categorical probability, video playlist. (YouTube)
Paolo Perrone, What is a probability monad?, tutorial. (YouTube)
Last revised on April 15, 2024 at 09:19:01. See the history of this page for a list of all contributions to it.