One can view a probability measure $p$ on a space $(X,\mathcal{A})$ as a βpile of massβ, for example, of sand, on the space $X$. Using this picture, given two probability spaces $(X,\mathcal{A},p)$ and $(Y,\mathcal{B},q)$, there could be many ways of moving the mass from $X$ to $Y$ in such a way that the sand from the pile $p$ is arranged to form the pile $q$. (The mass from which point goes to which point, or points?) This βway of moving the massβ is called a transport plan, and it is usually encoded by a joint distribution or by a Markov kernel (see below).
It is useful to keep track of in which way we are rearranging the mass $p$ to form $q$, and we can see these different ways as different morphisms, between the objects $(X,\mathcal{A},p)$ and $(Y,\mathcal{B},q)$, in a category of couplings.
Let $(X,\mathcal{A},p)$ and $(Y,\mathcal{B},q)$ be probability spaces. A coupling or transport plan between $(X,\mathcal{A},p)$ and $(Y,\mathcal{B},q)$ is a probability space $(X\times Y, \mathcal{A}\otimes\mathcal{B},r)$ where
$\mathcal{A}\otimes\mathcal{B}$ is the tensor product sigma-algebra on the product space $X\times Y$ (generated by the sets $A\times B$ with $A\in\mathcal{A}$ and $B\in\mathcal{B}$);
the measure $r$ has $p$ and $q$ as marginals, in the sense that for all $A\in\mathcal{A}$ and $B\in\mathcal{B}$,
Given a probability space $(X,\mathcal{A},p)$, the identity coupling or diagonal coupling is given by the following measure on $\mathcal{A}\otimes\mathcal{A}$:
for all $A,A'\in\mathcal{A}$.
Intuitively, this is a copy of $p$ on $X$ concentrated on the diagonal subset $\{(x,x):x\in X\}\subseteq X\times X$. (Whenever $(X,\mathcal{A})$ is standard Borel, the diagonal subset is measurable, and so this intuition can be made precise.)
This coupling gives the identity in the category of couplings. In terms of transport plans, this corresponds to not moving any mass (almost surely).
Given probability spaces $(X,\mathcal{A},p)$ and $(Y,\mathcal{B},q)$ the independent coupling or product coupling or constant coupling is given by the product measure $p\otimes q$, i.e.
for all $A\in\mathcal{A}$ and $B\in\mathcal{B}$.
In terms of transport plans, this arranges the mass from almost all points of $X$ to a distribution proportional to $q$, (almost surely) independently of the point of origin.
Let $(X,\mathcal{A},p)$, $(Y,\mathcal{B},q)$, $(Z,\mathcal{C},r)$ be standard Borel probability spaces, and consider transport plans $s$ from $p$ to $q$ and $t$ from $q$ to $r$. The composite transport plan $t\circ s$ from $p$ to $r$ is defined as follows:
for all $A\in\mathcal{A}$ and $C\in\mathcal{C}$, and where $s'$ and $t'$ are the regular conditional distributions associated to $s$ and $t$ given $Y$. The interpretation is that the mass in moved according to the plan $s$ and then according to the plan $t$, and in case the transport is stochastic, the two transitions are taken independently.
This construction gives composition in the category of couplings. When the transport plans are induced by functions or kernels (see below), the composition of transport plans is given by the composition of functions or kernels.
In Kozen-Silva-Voogdβ23, this construction was extended beyond the standard Borel case. (See there for the details.)
Let $f:(X,\mathcal{A},p)\to(Y,\mathcal{B},q)$ be a measure-preserving function. One can define the βdeterministicβ transport plan $r_f$ as follows,
for all $A\in\mathcal{A}$ and $B\in\mathcal{B}$. Intuitively, this maps all the mass at $x$ to the point $f(x)$, for every $x\in X$.
Note that in general there may exist no measure-preserving function between two probability spaces, for example, on the real line, if $p$ is a Dirac delta and $q$ is not. A construction that always exists is in terms of Markov kernels, see below.
Let $k:(X,\mathcal{A},p)\to(Y,\mathcal{B},q)$ be a measure-preserving Markov kernel. One can define a transport plan $r_k$ as follows,
for all $A\in\mathcal{A}$ and $B\in\mathcal{B}$. Intuitively, this maps all the mass at $x$ to a measure on $Y$ proportional to the measure $B\mapsto k(B|x)$.
Note that in the formula above, the measure $B\mapsto k(B|x)$ is invoked only for almost all $x$, and so it is insensitive to changes in $k$ on a $p$-measure-zero set. In a certain sense, this transporting the mass of $p$, more than the single points $x$.
In many cases, such as if $(X,\mathcal{A})$ and $(Y,\mathcal{B})$ are standard Borel, every transport plan is in the form $r_k$ for some $k$. See also the discussion at "categories of couplings".
Couplings are in some sense undirected, meaning that every transport plan from $X$ to $Y$ can also be seen as (and canonically induces) a transport plan from $Y$ to $X$.
This makes the category of couplings canonically a dagger category.
For transport plans specified by kernels, this symmetry corresponds exactly to Bayesian inversion of kernels.
optimal transport?
Cedric Villani, Optimal transport: old and new, Springer, 2008.
Fredrik Dahlqvist, Vincent Danos, Ilias Garnier, and Alexandra Silva, Borel kernels and their approximation, categorically, MFPS 2018. arXiv.
Dexter Kozen, Alexandra Silva, Erik Voogd, Joint Distributions in Probabilistic Semantics, MFPS 2023. (arXiv)
Paolo Perrone, Lifting couplings in Wasserstein spaces, 2021. (arXiv:2110.06591)
Last revised on February 7, 2024 at 17:02:42. See the history of this page for a list of all contributions to it.