nLab stochastic dependence and independence

Redirected from "independent events".
Contents

Context

Measure and probability theory

Monoidal categories

monoidal categories

With braiding

With duals for objects

With duals for morphisms

With traces

Closed structure

Special sorts of products

Semisimplicity

Morphisms

Internal monoids

Examples

Theorems

In higher category theory

Limits and colimits

Contents

Idea

One of the main purposes of probability theory, and of related fields such as statistics and information theory, is to make predictions in situations of uncertainty.

Suppose that we are interested a quantity XX, whose value we don’t know exactly (for example, a random variable), and which we cannot observe directly. Suppose that we have another quantity, YY, which we also don’t know exactly, but which we can observe (for example, through an experiment). We might now wonder: can observing YY give us information about XX, and reduce its uncertainty? Viewing the unknown quantities XX and YY as having hidden knowledge or hidden information, one might ask, how much of this hidden information is shared between XX and YY, so that observing YY uncovers information about XX as well?

This form of dependence between XX and YY is called stochastic dependence, and is one of the most important concepts both in probability theory, and, due to its conceptual nature, in most categorical approaches to probability theory.

Intuition

Dependence and independence

Very roughly, two unknown quantities are stochastically independent when learning the value of one of them does not change what one should expect about the other.

For instance, suppose that XX and YY are random variables. If observing YY changes the distribution one assigns to XX, then XX and YY are stochastically dependent. If observing YY does not change the distribution of XX, then XX and YY are stochastically independent.

For a concrete example, let XX be the result of throwing one die, and let YY be the result of throwing another die. If the dice are thrown separately and do not influence each other, then learning the value of YY does not change the probabilities assigned to XX. If we learn that Y=6Y=6, the probability that X=6X=6 is still 1/61/6. Thus XX and YY are independent.

By contrast, let XX be the result of throwing a die, and let YY be the event that the result is even. Then XX and YY are dependent: learning that YY has occurred changes the possible values of XX from

{1,2,3,4,5,6} \{1,2,3,4,5,6\}

to

{2,4,6}. \{2,4,6\}.

Thus observing YY gives information about XX.

A more geometric example is given by choosing a point uniformly at random in the unit disk, and letting XX and YY be its two coordinates. Each coordinate separately has a symmetric distribution, but the two coordinates are not independent: if XX is close to 11, then YY must be close to 00. The dependence is not caused by either coordinate determining the other exactly, but by the joint constraint

X 2+Y 21. X^2+Y^2\leq 1.

Conditional dependence and independence

Often one is interested not in whether two quantities are independent absolutely, but whether they are independent relative to some background information.

A standard example is the following. Let ZZ say whether it is raining, let XX say whether the grass in Alice’s garden is wet, and let YY say whether the grass in Bob’s garden is wet. Marginally, XX and YY are dependent: if Alice’s grass is wet, this is evidence that it is raining, and hence evidence that Bob’s grass is wet. But after conditioning on whether it is raining, the remaining dependence may disappear. In that case XX and YY are conditionally independent given ZZ.

Thus conditional independence does not mean that XX and YY are independent in the usual sense. Rather, it means that their dependence is completely mediated by the conditioning variable ZZ.

Conversely, conditioning can also create dependence. For example, suppose that two independent causes XX and YY may each lead to a common effect ZZ. If one conditions on the effect ZZ having occurred, then learning that XX occurred can make YY less likely, since the effect has already been explained by XX. This phenomenon is sometimes called explaining away?. Thus conditional independence and ordinary independence are logically distinct notions.

In measure-theoretic probability

Let (Ω,,p)(\Omega,\mathcal{F},p) be a probability space, and let A,BA,B\in\mathcal{F} be events, i.e. measurable subsets of Ω\Omega. We say that AA and BB are independent if and only if

p(AB)=p(A)p(B), p(A\cap B) = p(A)\,p(B) ,

i.e. if the joint probability is the product of the probabilities.

More generally, if f:(Ω,)(X,𝒜)f:(\Omega,\mathcal{F})\to(X,\mathcal{A}) and g:(Ω,)(Y,)g:(\Omega,\mathcal{F})\to(Y,\mathcal{B}) are random variables or random elements, one says that ff and gg are independent if and only if all the events they induce are independent, i.e. for every A𝒜A\in\mathcal{A} and BB\in\mathcal{B},

p(f 1(A)g 1(B))=p(f 1(A))p(g 1(B)). p\big(f^{-1}(A)\cap g^{-1}(B)\big) = p\big(f^{-1}(A)\big)\,p\big(g^{-1}(B)\big) .

Equivalently, one can form the joint random variable (f,g):(Ω,)(X×Y,𝒜)(f,g):(\Omega,\mathcal{F})\to(X\times Y,\mathcal{A}\otimes\mathcal{B}) and form the joint distribution q=(f,g) *pq=(f,g)_*p on X×YX\times Y. We have that ff and gg are independent as random variables if and only if for every A𝒜A\in\mathcal{A} and BB\in\mathcal{B},

q(π 1 1(A)π 2 1(B))=q(π 1 1(A))q(π 2 1(B)). q\big(\pi_1^{-1}(A)\cap \pi_2^{-1}(B)\big) = q\big(\pi_1^{-1}(A)\big)\,q\big(\pi_2^{-1}(B)\big) .

If we denote the marginal distributions of qq by q Xq_X and q Yq_Y, the independence condition reads

q(A×B)=q X(A)q Y(B), q(A \times B) = q_X(A)\,q_Y(B) ,

meaning that the joint distribution qq is the product of its marginals:

q=q Xq Y. q = q_X \otimes q_Y .

In this form, stochastic dependence is the obstruction to reconstructing the joint law from its two marginals. The marginal laws q Xq_X and q Yq_Y describe the two random quantities separately, while the joint law qq contains in addition their possible dependence structure.

For a finite family of random variables

f i:(Ω,)(X i,𝒜 i),i=1,,n, f_i:(\Omega,\mathcal{F})\to(X_i,\mathcal{A}_i), \qquad i=1,\ldots,n,

mutual independence means that the joint distribution factors as the product of its marginals:

(f 1,,f n) *p=(f 1) *p(f n) *p. (f_1,\ldots,f_n)_*p = (f_1)_*p\otimes\cdots\otimes(f_n)_*p .

Equivalently, for all measurable subsets A i𝒜 iA_i\in\mathcal{A}_i,

p(f 1 1(A 1)f n 1(A n))= i=1 np(f i 1(A i)). p\big(f_1^{-1}(A_1)\cap\cdots\cap f_n^{-1}(A_n)\big) = \prod_{i=1}^n p\big(f_i^{-1}(A_i)\big).

The infinitary equivalent can be obtained by means of the Kolmogorov extension theorem.

Conditional independence

Let 𝒢\mathcal{G}\subseteq\mathcal{F} be a sub-σ\sigma-algebra, thought of as encoding some background information. Events A,BA,B\in\mathcal{F} are conditionally independent given 𝒢\mathcal{G} if

p(AB𝒢)=p(A𝒢)p(B𝒢) p(A\cap B\mid\mathcal{G}) = p(A\mid\mathcal{G})\,p(B\mid\mathcal{G})

almost surely.

More generally, random elements f:(Ω,)(X,𝒜)f:(\Omega,\mathcal{F})\to(X,\mathcal{A}) and g:(Ω,)(Y,)g:(\Omega,\mathcal{F})\to(Y,\mathcal{B}) are conditionally independent given 𝒢\mathcal{G} if, for all A𝒜A\in\mathcal{A} and BB\in\mathcal{B},

p(f 1(A)g 1(B)𝒢)=p(f 1(A)𝒢)p(g 1(B)𝒢) p\big(f^{-1}(A)\cap g^{-1}(B)\mid\mathcal{G}\big) = p\big(f^{-1}(A)\mid\mathcal{G}\big)\, p\big(g^{-1}(B)\mid\mathcal{G}\big)

almost surely.

If h:(Ω,)(Z,𝒞)h:(\Omega,\mathcal{F})\to(Z,\mathcal{C}) is another random element, one says that ff and gg are conditionally independent given hh if they are conditionally independent given the sub-σ\sigma-algebra generated by hh:

σ(h). \sigma(h)\subseteq\mathcal{F}.

This is often written

fgh. f \perp g \mid h .

When regular conditional probabilities? exist, conditional independence can be expressed by saying that the conditional joint law of (f,g)(f,g) over h=zh=z factors as the product of the corresponding conditional marginal laws:

p f,gh=z=p fh=zp gh=z p_{f,g\mid h=z} = p_{f\mid h=z}\otimes p_{g\mid h=z}

for p hp_h-almost every zz.

Equivalently, the joint law of (f,g,h)(f,g,h) admits the factorization

p f,g,h(dx,dy,dz)=p h(dz)p fh=z(dx)p gh=z(dy). p_{f,g,h}(dx,dy,dz) = p_h(dz)\,p_{f\mid h=z}(dx)\,p_{g\mid h=z}(dy).

In terms of the Giry monad

Denote by GG the Giry monad on Meas, sending a measurable space XX to the measurable space G(X)G(X) of probability measures on XX.

The coordinate projections

π X:X×YX,π Y:X×YY \pi_X:X\times Y\to X, \qquad \pi_Y:X\times Y\to Y

induce maps

G(π X):G(X×Y)G(X),G(π Y):G(X×Y)G(Y), G(\pi_X):G(X\times Y)\to G(X), \qquad G(\pi_Y):G(X\times Y)\to G(Y),

and hence the marginals

q X=G(π X)q,q Y=G(π Y)q. q_X = G(\pi_X)\circ q, \qquad q_Y = G(\pi_Y)\circ q .

The Giry monad is compatible with products by the usual product-measure construction: there is a natural map

G(X)×G(Y)G(X×Y),(μ,ν)μν. G(X)\times G(Y)\longrightarrow G(X\times Y), \qquad (\mu,\nu)\to \mu\otimes\nu .

In these terms, a joint distribution q:1G(X×Y)q:1\to G(X\times Y) is independent precisely if it is obtained from its marginals by this product-measure map: q=q Xq Yq = q_X\otimes q_Y.

Equivalently, if random elements f:ΩXf:\Omega\to X, g:ΩYg:\Omega\to Y are defined on a probability space (Ω,,p)(\Omega,\mathcal{F},p), then ff and gg are independent precisely when

(f,g) *p=f *pg *p. (f,g)_*p = f_*p\otimes g_*p .

Thus, from the point of view of the Giry monad, stochastic dependence is exactly the failure of a state of the product object X×YX\times Y to be the product of its marginal states.

Higher arities

More generally, for a finite family of measurable spaces X 1,,X nX_1,\ldots,X_n, there is a product-measure map

G(X 1)××G(X n)G(X 1××X n),(μ 1,,μ n)μ 1μ n. G(X_1)\times\cdots\times G(X_n) \longrightarrow G(X_1\times\cdots\times X_n), \qquad (\mu_1,\ldots,\mu_n)\to \mu_1\otimes\cdots\otimes\mu_n .

A joint distribution

q:1G(X 1××X n) q:1\to G(X_1\times\cdots\times X_n)

is independent precisely if it is obtained from its marginals by this map:

q=q 1q n. q=q_1\otimes\cdots\otimes q_n .

Infinite families

For an infinite family (X i) iI(X_i)_{i\in I}, the corresponding statement is subtler. One would like to say that an independent joint distribution on

iIX i \prod_{i\in I}X_i

is determined by its finite-dimensional marginals

q JG( jJX j),JI finite, q_J\in G\left(\prod_{j\in J}X_j\right), \qquad J\subseteq I \text{ finite},

and that these finite-dimensional marginals factor as

q J= jJq j. q_J=\bigotimes_{j\in J}q_j .

The existence of a measure on the infinite product with these prescribed finite-dimensional marginals is the content of the Kolmogorov extension theorem, under suitable hypotheses on the measurable spaces. Thus, for infinite families, independence is naturally formulated in terms of compatible finite-dimensional product distributions.

In particular, a family of random elements

f i:ΩX i,iI, f_i:\Omega\to X_i, \qquad i\in I,

is independent if every finite subfamily is independent, equivalently if for every finite subset JIJ\subseteq I the joint law of (f j) jJ(f_j)_{j\in J} factors as the product of its marginals:

((f j) jJ) *p= jJ(f j) *p. ((f_j)_{j\in J})_*p = \bigotimes_{j\in J}(f_j)_*p .

In the category of Markov kernels

(See the analogous section in Joint and marginal probability.)

In Markov categories

(For now see Markov category - Stochastic independence)

In dagger categories

(…)

See also

References

  • Kenta Cho, Bart Jacobs, Disintegration and Bayesian Inversion via String Diagrams, Mathematical Structures of Computer Science 29, 2019. (arXiv:1709.00322)

  • Tobias Fritz, A synthetic approach to Markov kernels, conditional independence and theorems on sufficient statistics, Advances of Mathematics 370, 2020. (arXiv:1908.07021)

  • Alex Simpson, Equivalence and Conditional Independence in Atomic Sheaf Logic (arXiv:2405.11073)

category: probability

Last revised on May 11, 2026 at 10:40:54. See the history of this page for a list of all contributions to it.