In elementary probability theory, Bayes' formula refers to a version of the formula
and relates the conditional probability of $B$ given $A$ to the one of $A$ given $B$.
From the category-theoretic point of view, this formula expresses a duality, sometimes called Bayesian inversion, which often gives rise to a dagger structure.
In logical reasoning, implications in general cannot be reversed: $A \to B$, alone, does not imply $B\to A$.
In probability theory, instead, conditional statements exhibit a duality which is absent in pure logical reasoning.
Consider for example a city in which all taxis are yellow. (The implication is $taxi\to yellow$.) If we see a yellow car, of course we can’t be sure it’s a taxi. However, it’s more likely that it’s a taxi compared to a randomly colored car. This increase in likelihood is larger if the fraction of yellow cars is small. Indeed, Bayes' rule says that
More generally, in a city where most taxis are yellow, there is a high conditional probability that a given taxi is yellow,
The higher this probability is, the higher is the probability that a given yellow car is a taxi, again according to Bayes' rule:
In categorical probability, this phenomenon can be modeled by saying that to each “conditional” morphism in the form $\{taxi, not\,taxi\}\to\{colors\,of\,cars\}$ there corresponds a canonical morphism $\{colors\,of\,cars\}\to\{taxi, not\,taxi\}$, called the Bayesian inverse. The name, which is kept for historical reasons, is somewhat improper, since we don’t actually have an inverse morphism in the sense of category theory, but simply a reversal of the arrow.
In some cases, this symmetry gives rise to a dagger category.
In traditional probability theory, Bayesian inversions are a special case of conditional probability. Some care must be taken to avoid dividing by zero or incurring into paradoxes via limiting procedures.
In the discrete case, a probability distribution on a set $X$ is simply a function $p:X\to[0,1]$ such that
A stochastic map $k:X\to Y$ is a function such that for all $x\in X$,
If we now equip $X$ and $Y$ with discrete probability distributions $p$ and $q$, obtaining discrete probability spaces, we say that $k$ is a measure-preserving stochastic map $(X,p)\to(Y,q)$ if for every $y\in Y$,
A Bayesian inverse of $k:(X,p)\to(Y,q)$ is then defined to be a measure-preserving stochastic map $k^\dagger:(Y,q)\to(X,p)$ such that for every $x\in X$ and $y\in Y$, the following Bayes formula holds.
In the discrete case, Bayesian inverses always exist, and can be obtained by taking
for those $y\in Y$ with $q(y)\ne 0$, and an arbitrary number on the $y\in Y$ with $q(y)=0$. (To ensure the normalization condition, on such $y$ one can for example take $k^\dagger(x|y)=\delta_{x_0,x}$, where $x_0$ is a fixed element of $X$. Note that $X$ is nonempty since it admits a probability measure.)
Bayesian inverses are not unique, but they are uniquely defined on the support of $q$. That is, they are unique up to almost sure equality.
The situation is more delicate outside the discrete case. Given probability spaces $(X,\mathcal{A},p)$ and $(Y,\mathcal{B},q)$, and a measure-preserving Markov kernel $k:(X,\mathcal{A},p)\to(Y,\mathcal{B},q)$, a Bayesian inverse of $k$ is a Markov kernel $k^\dagger:(Y,\mathcal{B},q)\to(X,\mathcal{A},p)$ such that for all $A\in\mathcal{A}$ and $B\in\mathcal{B}$, the following Bayes-type formula holds.
As one can see from Markov kernel - Almost sure equality, this formula specifies a kernel only up to almost sure equality, just as in the discrete case.
Existence, on the other hand, is more tricky. In general, a kernel $k^\dagger$ as above may fail to exist. The problem is that in order for $k^\dagger$ to be a well-defined Markov kernel $(Y,\mathcal{B})\to(X,\mathcal{A})$, we need the following two conditions:
The first condition can be taken care of using conditional expectation. That however does not assure the second condition. It can be shown, however, that if $(X,\mathcal{A})$ is standard Borel and either
or
then a Bayesian inverse always exists.
(See also Markov kernel - Bayesian inversion and Markov kernel - Conditionals.)
In categorical probability, Bayesian inverses are axiomatized in a way which reflects the measure-theoretic version of the concepts. One then can choose to work in categories where such axioms are satisfied.
In Markov categories, Bayesian inverses are defined in a way that parallels the construction for Markov kernels.
The abstraction of a probability space is given by an object $X$ in a Markov category, together with a state $p:I\to X$. As usual, the abstraction of a kernel is a morphism $f:X\to Y$.
A Bayesian inverse of $f$ with respect to $p$ is a morphism $f^\dagger:Y\to X$ such that the following equation holds, where $q=f\circ p$.
This recovers the classical probability definitions when instantiated in Stoch and its subcategories.
Just as in traditional probability, Bayesian inverses are unique only up to almost sure equality?. Also, just as in traditional probability, they may fail to exist. Being an instance of conditionals, however, they always exists when conditionals exist, such as in the category BorelStoch.
(See also Markov category - conditionals.)
In the category of couplings, the idea of Bayesian inversion is made explicit from the start by means of the dagger structure. Given probability spaces $(X,\mathcal{A},p)$ and $(Y,\mathcal{B},q)$, a coupling between them can be seen equivalently as going from $X$ to $Y$ or from $Y$ to $X$. This duality, when the couplings are expressed via Markov kernels, reflects exactly Bayesian inversion. Therefore, at the level of joint distributions, one can consider the duality given by Bayesian inversions to be already part of the symmetry of the category.
Categorical abstractions of the category of coupling via dagger categories have therefore the concept of Bayesian inversion already built in.
(For now, see the references.)
For categorical probability:
Tobias Fritz, A synthetic approach to Markov kernels, conditional independence and theorems on sufficient statistics, Advances of Mathematics 370, 2020. (arXiv:1908.07021)
Kenta Cho and Bart Jacobs, Disintegration and Bayesian Inversion via String Diagrams, Mathematical Structures of Computer Science 29, 2019. (arXiv:1709.00322)
Dario Stein and Sam Staton, Probabilistic Programming with Exact Conditions, JACM, 2023. (arXiv)
Noé Ensarguet and Paolo Perrone, Categorical probability spaces, ergodic decompositions, and transitions to equilibrium, 2023. (arXiv:2310.04267)
For the quantum case:
Bob Coecke and Robert W. Spekkens, Picturing classical and quantum Bayesian inference, Synthese, 186(3), 2012. (arXiv)
Arthur J. Parzygnat, Inverses, disintegrations, and Bayesian inversion in quantum Markov categories, 2020. (arXiv)
Arthur J. Parzygnat and Benjamin P. Russo, A noncommutative Bayes theorem, Linear Algebra Applications 644, 2022. (arXiv)
Arthur J. Parzygnat, Conditional distributions for quantum systems, EPTCS 343, 2021. (arXiv)
Arthur J. Parzygnat, Francesco Buscemi, Axioms for retrodiction: achieving time-reversal symmetry with a prior, Quantum 7(1013), 2023. arXiv
James Fullwood and Arthur J. Parzygnat, From time-reversal symmetry to quantum Bayes rules, PRX Quantum 4, 2023. (arXiv)
Arthur J. Parzygnat, Benjamin P. Russo, Non-commutative disintegrations: existence and uniqueness in finite dimensions, Journal of Noncommutative Geometry 17(3), 2023. (arXiv)
Last revised on May 8, 2024 at 12:38:04. See the history of this page for a list of all contributions to it.