nLab Bayesian inversion

Contents

Contents

Idea

In elementary probability theory, Bayes' formula refers to a version of the formula

P(A)P(B|A)=P(B)P(A|B), P(A)\,P(B|A) \,=\, P(B)\,P(A|B) \,,

and relates the conditional probability of BB given AA to the one of AA given BB.

From the category-theoretic point of view, this formula expresses a duality, sometimes called Bayesian inversion, which often gives rise to a dagger structure.

Intuition

In logical reasoning, implications in general cannot be reversed: ABA \to B, alone, does not imply BAB\to A.

In probability theory, instead, conditional statements exhibit a duality which is absent in pure logical reasoning.

Consider for example a city in which all taxis are yellow. (The implication is taxiyellowtaxi\to yellow.) If we see a yellow car, of course we can’t be sure it’s a taxi. However, it’s more likely that it’s a taxi compared to a randomly colored car. This increase in likelihood is larger if the fraction of yellow cars is small. Indeed, Bayes' rule says that

P(taxi|yellow)=P(yellowtaxi)P(yellow)=P(taxi)P(yellow). P(taxi|yellow) = \frac{P(yellow\,taxi)}{P(yellow)} = \frac{P(taxi)}{P(yellow)} .

More generally, in a city where most taxis are yellow, there is a high conditional probability that a given taxi is yellow,

P(yellow|taxi). P(yellow|taxi) .

The higher this probability is, the higher is the probability that a given yellow car is a taxi, again according to Bayes' rule:

P(taxi|yellow)=P(yellowtaxi)P(yellow)=P(taxi)P(yellow|taxi)P(yellow) P(taxi|yellow) = \frac{P(yellow\,taxi)}{P(yellow)} = \frac{P(taxi)\,P(yellow|taxi)}{P(yellow)}

In categorical probability, this phenomenon can be modeled by saying that to each “conditional” morphism in the form {taxi,nottaxi}{colorsofcars}\{taxi, not\,taxi\}\to\{colors\,of\,cars\} there corresponds a canonical morphism {colorsofcars}{taxi,nottaxi}\{colors\,of\,cars\}\to\{taxi, not\,taxi\}, called the Bayesian inverse.

In some cases, this symmetry gives rise to a dagger category.

In traditional probability theory

In traditional probability theory, Bayesian inversions are a special case of conditional probability. Some care must be taken to avoid dividing by zero or incurring into paradoxes via limiting procedures.

Discrete case

In the discrete case, a probability distribution on a set XX is simply a function p:X[0,1]p:X\to[0,1] such that

xXp(x)=1. \sum_{x\in X} p(x) = 1 .

A stochastic map k:XYk:X\to Y is a function such that for all xXx\in X,

yYk(y|x)=1. \sum_{y\in Y} k(y|x) = 1 .

If we now equip XX and YY with discrete probability distributions pp and qq, obtaining discrete probability spaces, we say that kk is a measure-preserving stochastic map (X,p)(Y,q)(X,p)\to(Y,q) if for every yYy\in Y,

xk(y|x)p(x)=q(y). \sum_{x} k(y|x) \, p(x) = q(y) .

A Bayesian inverse of k:(X,p)(Y,q)k:(X,p)\to(Y,q) is then defined to be a measure-preserving stochastic map k :(Y,q)(X,p)k^\dagger:(Y,q)\to(X,p) such that for every xXx\in X and yYy\in Y, the following Bayes formula holds.

p(x)k(y|x)=q(y)k (x|y) p(x) \,k(y|x) = q(y)\,k^\dagger(x|y)

In the discrete case, Bayesian inverses always exist, and can be obtained by taking

k (x|y)p(x)k(y|x)q(y) k^\dagger(x|y) \coloneqq \frac{p(x)\,k(y|x)}{q(y)}

for those yYy\in Y with q(y)0q(y)\ne 0, and an arbitrary number on the yYy\in Y with q(y)=0q(y)=0. (To ensure the normalization condition, on such yy one can for example take k (x|y)=δ x 0,xk^\dagger(x|y)=\delta_{x_0,x}, where x 0x_0 is a fixed element of XX. Note that XX is nonempty since it admits a probability measure.)

Bayesian inverses are not unique, but they are uniquely defined on the support of qq. That is, they are unique up to almost sure equality.

Measure-theoretic case

The situation is more delicate outside the discrete case. Given probability spaces (X,𝒜,p)(X,\mathcal{A},p) and (Y,,q)(Y,\mathcal{B},q), and a measure-preserving Markov kernel k:(X,𝒜,p)(Y,,q)k:(X,\mathcal{A},p)\to(Y,\mathcal{B},q), a Bayesian inverse of kk is a Markov kernel k :(Y,,q)(X,𝒜,p)k^\dagger:(Y,\mathcal{B},q)\to(X,\mathcal{A},p) such that for all A𝒜A\in\mathcal{A} and BB\in\mathcal{B}, the following Bayes-type formula holds.

Ak(B|x)p(dx)= Bk (A|y)q(dy) \int_A k(B|x)\,p(dx) = \int_B k^\dagger(A|y)\,q(dy)

As one can see from Markov kernel - Almost sure equality, this formula specifies a kernel only up to almost sure equality, just as in the discrete case.

Existence, on the other hand, is more tricky. In general, a kernel k k^\dagger as above may fail to exist. The problem is that in order for k k^\dagger to be a well-defined Markov kernel (Y,)(X,𝒜)(Y,\mathcal{B})\to(X,\mathcal{A}), we need the following two conditions:

  • the map yk (A|y)y\mapsto k^\dagger(A|y) is measurable in yy for all A𝒜A\in\mathcal{A};
  • the assignment Ak (A|y)A\mapsto k^\dagger(A|y) is a probability measure in AA for all yYy\in Y.

The first condition can be taken care of using conditional expectation. That however does not assure the second condition. It can be shown, however, that if (X,𝒜)(X,\mathcal{A}) is standard Borel and either

  • (Y,)(Y,\mathcal{B}) is standard Borel too

or

  • the kernel k:XYk:X\to Y is in the form δ f\delta_f for a measurable f:XYf:X\to Y,

then a Bayesian inverse always exists.

(See also Markov kernel - Bayesian inversion and Markov kernel - Conditionals.)

In categorical probability

In categorical probability, Bayesian inverses are axiomatized in a way which reflects the measure-theoretic version of the concepts. One then can choose to work in categories where such axioms are satisfied.

In Markov categories

In Markov categories, Bayesian inverses are defined in a way that parallels the construction for Markov kernels.

The abstraction of a probability space is given by an object XX in a Markov category, together with a state p:IXp:I\to X. As usual, the abstraction of a kernel is a morphism f:XYf:X\to Y.
A Bayesian inverse of ff with respect to pp is a morphism f :YXf^\dagger:Y\to X such that the following equation holds, where q=fpq=f\circ p.

This recovers the classical probability definitions when instantiated in Stoch and its subcategories.

Just as in traditional probability, Bayesian inverses are unique only up to almost sure equality?. Also, just as in traditional probability, they may fail to exist. Being an instance of conditionals, however, they always exists when conditionals exist, such as in the category BorelStoch.

(See also Markov category - conditionals.)

In dagger categories

In the category of couplings, the idea of Bayesian inversion is made explicit from the start by means of the dagger structure. Given probability spaces (X,𝒜,p)(X,\mathcal{A},p) and (Y,,q)(Y,\mathcal{B},q), a coupling between them can be seen equivalently as going from XX to YY or from YY to XX. This duality, when the couplings are expressed via Markov kernels, reflects exactly Bayesian inversion. Therefore, at the level of joint distributions, one can consider the duality given by Bayesian inversions to be already part of the symmetry of the category.

Categorical abstractions of the category of coupling via dagger categories have therefore the concept of Bayesian inversion already built in.

Quantum versions

(For now, see the references.)

See also

References

For categorical probability:

  • Tobias Fritz, A synthetic approach to Markov kernels, conditional independence and theorems on sufficient statistics, Advances of Mathematics 370, 2020. (arXiv:1908.07021)

  • Kenta Cho and Bart Jacobs, Disintegration and Bayesian Inversion via String Diagrams, Mathematical Structures of Computer Science 29, 2019. (arXiv:1709.00322)

  • Dario Stein and Sam Staton, Probabilistic Programming with Exact Conditions, JACM, 2023. (arXiv)

  • Noé Ensarguet and Paolo Perrone, Categorical probability spaces, ergodic decompositions, and transitions to equilibrium, 2023. (arXiv:2310.04267)

For the quantum case:

  • Bob Coecke and Robert W. Spekkens, Picturing classical and quantum Bayesian inference, Synthese, 186(3), 2012. (arXiv)

  • Arthur J. Parzygnat, Inverses, disintegrations, and Bayesian inversion in quantum Markov categories, 2020. (arXiv)

  • Arthur J. Parzygnat and Benjamin P. Russo, A noncommutative Bayes theorem, Linear Algebra Applications 644, 2022. (arXiv)

  • Arthur J. Parzygnat, Conditional distributions for quantum systems, EPTCS 343, 2021. (arXiv)

  • Arthur J. Parzygnat, Francesco Buscemi, Axioms for retrodiction: achieving time-reversal symmetry with a prior, Quantum 7(1013), 2023. arXiv

  • James Fullwood and Arthur J. Parzygnat, From time-reversal symmetry to quantum Bayes rules, PRX Quantum 4, 2023. (arXiv)

  • Arthur J. Parzygnat, Benjamin P. Russo, Non-commutative disintegrations: existence and uniqueness in finite dimensions, Journal of Noncommutative Geometry 17(3), 2023. (arXiv)

category: probability

Last revised on March 22, 2024 at 14:06:35. See the history of this page for a list of all contributions to it.