nLab joint and marginal probability



Measure and probability theory

Monoidal categories

monoidal categories

With braiding

With duals for objects

With duals for morphisms

With traces

Closed structure

Special sorts of products



Internal monoids



In higher category theory

Limits and colimits



In probability theory, it is a well known fact that events are not always independent, i.e. that in general

P(A,B)P(A)P(B). P(A,B) \ne P(A)\,P(B).

One says that the probability of a product is not the product of the probabilities. The term on the left is called the joint probability (the probability that the events AA and BB happen jointly) and the terms on the right are called marginal probabilities (the probabilities that AA and BB happen separately.) In general, the joint probability has more information than the marginal probabilities alone, because of the presence of correlation or other stochastic interactions.

From the category-theoretic perspective, one can encode this idea using the following (related) concepts:


When one takes as Markov category the Kleisli category of a probability monad, the two concepts coincide.

In measure-theoretic probability

Let (Ω,,p)(\Omega,\mathcal{F},p) be a probability space, and let A,BA,B\subseteq\mathcal{F} be events, i.e. measurable subsets of Ω\Omega. We call the joint probability of AA and BB the probability of their intersection, p(AB)p(A\cap B).

More generally, let f:(Ω,)(X,𝒜)f:(\Omega,\mathcal{F})\to(X,\mathcal{A}) and g:(Ω,)(Y,)g:(\Omega,\mathcal{F})\to(Y,\mathcal{B}) be random variables (or random elements) on Ω\Omega. The joint random variable of ff and gg is the random variable (f,g):(Ω,)(X×Y,𝒜)(f,g):(\Omega,\mathcal{F})\to(X\times Y,\mathcal{A}\otimes\mathcal{B}) given by the universal property of the product:

We call

  • The joint distribution of ff and gg (or of XX and YY, when the maps are clear from the context) the pushforward measure p X,Y=(f,g) *pp_{X,Y}=(f,g)_*p on X×YX\times Y;
  • The marginal distributions of ff and gg (or of XX and YY) the pushforward measures p X=f *pp_X=f_*p and p Y=g *pp_Y=g_*p on XX and YY, respectively.

Note that (π 1) *p X,Y=p X(\pi_1)_*p_{X,Y} = p_X and (π 2) *p X,Y=p Y(\pi_2)_*p_{X,Y} = p_Y, so that the marginalization maps can be seen as the functorial images of the product projections under a probability monad.

In absence of the space Ω\Omega one can call “joint distribution” any probability distribution on a product space X×YX\times Y, and define its marginal distributions by pushing forward along the product projections. This is equivalent to taking X×YX\times Y as Ω\Omega. The same is true for products of more spaces, including infinite products.

Given two (marginal) distributions pp and qq on spaces (X,𝒜)(X,\mathcal{A}) and (Y,)(Y,\mathcal{B}), in principle there could be many joint distributions on X×YX\times Y admitting pp and qq as their marginals, corresponding to different possible stochastic interactions. A canonical choice of joint distribution is given by the product distribution pqp\otimes q, defined on product sets in the form A×BX×YA\times B\subseteq X\times Y by

(pq)(A×B)=p(A)q(B), (p\otimes q)\,(A\times B) = p(A)\,q(B) ,

which makes XX and YY independent.

In terms of probability monads

Recall that the Giry monad, as well as other probability monads, encodes the idea of forming, from a space XX, a space PXP X of probability measures over it. A joint distribution is then encoded by an element of the space P(X×Y)P(X\times Y), and the marginals are elements of PXP X and PYP Y obtained via applying PP to the product projections. Additionally, by the universal property of products, there exists a unique map Δ:P(X×Y)PX×PY\Delta:P(X\times Y)\to P X\times P Y making the following diagram commute. This can be interpreted as forming, from a joint distribution, the pair of its marginals, which in general encode less information. The same can be done for joint distributions over a larger number of factors, including infinite products.

In order to form product distributions, one needs additional compatibility conditions between the monad structure (giving probability measures) and the products of the category. These conditions are exactly captured by the idea of a monoidal monad, or equivalently, a commutative monad.
This amounts in particular to a natural map :PX×PYP(X×Y)\nabla:P X\times P Y\to P(X\times Y) satisfying particular compatibility conditions. For the Giry monad, it is the following map.

Consider now the following two diagrams.

We have that

  • The commutativity of the diagram on the left says that if one forms a product measure and then takes its marginals, one recovers the two factors. In other words, every product probability measure is necessarily the product of its marginals. One can show that the diagram on the left commutes if and only if PP preserves the terminal object, i.e. P(1)1P(1)\cong 1 (see at affine monad). This is the case for the Giry monad and for most probability monads, since the one-point space admits a unique probability measure. (It can be considered a “normalization” condition for probability measures: if one drops normalization, the one-point space admits several measures.)

  • In general, the diagram on the right does not commute: if one starts with a joint distribution and then forms its marginals, the original distribution may be different from the product of the marginals. Therefore marginalization is a destructive operation, forgetting possible stochastic interaction. Indeed, if the diagram on the left commutes, then the diagram on the right commutes if and only if the monad PP preserves products, i.e. it is strong monoidal. The “random” behavior of the Giry monad and of other probability monads can be explained, to a large degree, by their failure to preserve products. (More on this below, and at Markov category).

In the category of Markov kernels

As we have seen above, a joint distribution, in general, encodes more information than the pair of its marginals, and cannot always be recovered by forming the product distribution.

We can express this equivalently by looking at the category Stoch of Markov kernels, as well as at other Kleisli categories of similar probability monads.

First of all, recall that the category Stoch inherits the monoidal structure from the product in Meas, the cartesian product of measurable spaces.

This product in Stoch, however, is not a categorical product: the universal property of a product requires that given objects Ω\Omega, XX and YY in a category, for every pair of morphisms f:ΩXf:\Omega\to X and g:ΩYg:\Omega\to Y there exists a unique morphism ΩXY\Omega\to X\otimes Y making the following diagram commute. For Markov kernels, this property does not hold: setting Ω=1\Omega=1, the one-point space, the property would require in particular that there is a unique joint distribution on X×YX\times Y for every pair of marginals XX and YY. As we have seen above, this is not the case. What fails is not the existence part of the universal property: a joint distribution always exists, the product distribution. What fails is the uniqueness part. In other words, the monoidal structure in Stoch is a weak product but not a categorical product.

More generally, given a monoidal monad on a cartesian monoidal category, the induced monoidal structure on the Kleisli category is monoidal only when the monad is strong monoidal, and we saw that this is not the case for most probability monads. (As mentioned at Markov category, a cartesian monoidal Kleisli category can be seen as having “no randomness”.)

In Markov categories

In Markov categories, one can define joint and marginal states similarly to what happens in the category Stoch. Recall that a state on XX

has the interpretation of a probability measure on XX, and in Stoch it is exactly a Markov kernel 1X1\to X, i.e. a probability measure.

A joint state is a state in the form IXYI\to X\otimes Y:

In Stoch, this is exactly a probability measure on X×YX\times Y.

Given a joint state pp on XYX\otimes Y, its marginals are the states p Xp_X and p Yp_Y on XX and YY respectively given by discarding the unobserved variable:

In Stoch, these correspond exactly to marginal distributions.

More on this (and examples in other Markov categories) can be found at Markov category - States, channels, joints, marginals.

See also


  • Tobias Fritz, Paolo Perrone, Bimonoidal Structure of Probability Monads, Proceedings of MFPS, 2018, (arXiv:1804.03527)

  • Paolo Perrone, Categorical Probability and Stochastic Dominance in Metric Spaces, (thesis)

  • Kenta Cho, Bart Jacobs, Disintegration and Bayesian Inversion via String Diagrams, Mathematical Structures of Computer Science 29, 2019. (arXiv:1709.00322)

  • Tobias Fritz, A synthetic approach to Markov kernels, conditional independence and theorems on sufficient statistics, Advances of Mathematics 370, 2020. (arXiv:1908.07021)

  • Anders Kock, Bilinearity and cartesian closed monads, Mathematica Scandinavica 29(2), 1971.

category: probability

Last revised on March 24, 2024 at 18:07:21. See the history of this page for a list of all contributions to it.