nLab Bell's inequality



Measure and probability theory

Quantum systems

quantum logic

quantum physics

quantum probability theoryobservables and states

quantum information

quantum computation


quantum algorithms:

quantum sensing

quantum communication



In quantum physics/quantum information theory, What came to be called Bell’s inequality (Bell 1964) is an inequality satisfied by the three pairwise correlation functions between three random variables defined on one and the same classical probability space. As such, it is an elementary statement about classical probability theory which as been argued (Pitowsky 1989a) to have been known already to Boole (1854).

The point of the argument by Bell 1964 was to highlight that when taking these three random variables to be the results of quantum measurements of the spin of an electron along three pairwise non-orthogonal axes (as in the Stern-Gerlach experiment) then quantum theory predicts that this inequality is violated – implying that there is no single classical probability space (called a hidden variable in the context of interpretations of quantum mechanics) on which these three quantum measurement-results are jointly random variables.

A number of experiments have sought to check Bell’s inequalities in quantum physics (“Bell tests”) and all claim to have verified that it is indeed violated in nature (see Aspect 2015), as predicted by quantum theory.

Bell’s inequality has been and is receiving an enormous amount of attention, first in discussions of interpretations of quantum mechanics, but more recently and more concretely also in the context of quantum information theory.


Original formulation

The following is fairly verbatim recap of the original argument in Bell 1964. For a streamlined re-statement see further below.

Let us denote the result A of a measurement that is determined by a unit vector, a\vec{a}, and some parameter λ\lambda as A(a,λ)=±1A(\vec{a},\lambda)=\pm 1 where we further suppose that the outcome of the measurement is either +1 or -1. Likewise, we may do the same for the result B of a second measurement, i.e. B(b,λ)B(\vec{b},\lambda). We further make the vital assumption that the result B does not depend on a\vec{a} and likewise A does not depend on b\vec{b}.

Before proceeding, we should note that λ\lambda here plays the role of a “hidden” parameter or variable. We say it is “hidden” because its precise nature is not known. However, it is still a very real parameter with a probability distribution ρ(λ)\rho(\lambda). The expectation value of the product of the two measurements is

(1)P(a,b)=dλρ(λ)A(a,λ)B(b,λ). P(\vec{a},\vec{b})=\int d\lambda\rho(\lambda)A(\vec{a},\lambda)B(\vec{b},\lambda).

Because ρ\rho is a normalized probability distribution,

dλρ(λ)=1 \int d\lambda \rho(\lambda) = 1

and because A(a,λ)=±1A(\vec{a},\lambda)=\pm 1 and B(b,λ)=±1B(\vec{b},\lambda)=\pm 1, P cannot be less than -1. It can be equal to -1 at a=b\vec{a}=\vec{b} only if A(a,λ)=±1=B(a,λ)=±1A(\vec{a},\lambda)=\pm 1 = -B(\vec{a},\lambda)=\pm 1 except at a set of points λ\lambda of zero probability. Thus we can write (1) as

P(a,b)=dλρ(λ)A(a,λ)A(b,λ). P(\vec{a},\vec{b})=-\int d\lambda\rho(\lambda)A(\vec{a},\lambda)A(\vec{b},\lambda).

If we introduce a third unit vector c\vec{c} we can find the difference between the correlation of a\vec{a} to the two other unit vectors,

(2)P(a,b)P(a,c)=dλρ(λ)[A(a,λ)A(b,λ)A(a,λ)A(c,λ)]. P(\vec{a},\vec{b})-P(\vec{a},\vec{c})=-\int d\lambda\rho(\lambda)[A(\vec{a},\lambda)A(\vec{b},\lambda)-A(\vec{a},\lambda)A(\vec{c},\lambda)].

Rearranging this we may write (2) as

P(a,b)P(a,c)=dλρ(λ)A(a,λ)A(b,λ)[A(b,λ)A(c,λ)1]. P(\vec{a},\vec{b})-P(\vec{a},\vec{c})=-\int d\lambda\rho(\lambda)A(\vec{a},\lambda)A(\vec{b},\lambda)[A(\vec{b},\lambda)A(\vec{c},\lambda)-1].

Given the limitations we have placed on the value of A, we may write

|P(a,b)P(a,c)|dλρ(λ)[1A(b,λ)A(c,λ)]. |P(\vec{a},\vec{b})-P(\vec{a},\vec{c})| \le \int d\lambda\rho(\lambda)[1-A(\vec{b},\lambda)A(\vec{c},\lambda)].

But the second term on the right is simply P(b,c)P(\vec{b},\vec{c}) and thus

1+P(b,c)|P(a,b)P(a,c)| 1 + P(\vec{b},\vec{c}) \ge |P(\vec{a},\vec{b})-P(\vec{a},\vec{c})|

which is the original form of Bell’s inequality. Note that this may be written in terms of correlation coefficients,

1+C(b,c)|C(a,b)C(a,c)| 1 + C(b,c) \ge |C(a,b)-C(a,c)|

where a, b, and c are now settings on the measurement apparatus.

Quantum mechanical violations

The original derivation of Bell’s inequalities involved the use of a Stern-Gerlach device that measures spin along an axis. Suppose σ 1\sigma_{1} and σ 2\sigma_{2} are spins. The result, A, of measuring σ 1a\sigma_{1}\cdot\vec{a} is then interpreted as being entirely determined by a\vec{a} and λ\lambda. Likewise for B and σ 2b\sigma_{2}\cdot\vec{b}. It is also important to remember that the result B does not depend on a\vec{a} and likewise A does not depend on b\vec{b}.

For a singlet state (that is a state with total spin of zero), the quantum mechanical expectation value of measurements along two different axes (see the Wigner derivation below for a more intuitive explanation of the physical nature of this) is

σ 1a,σ 2b=ab. \langle\sigma_{1}\cdot\vec{a},\sigma_{2}\cdot\vec{b}\rangle = - \vec{a}\cdot\vec{b}.

In theory this ought to equal P(a,b)P(\vec{a},\vec{b}) but in practice it does not. It is important to remember that we are using classical reasoning throughout our derivations of the various forms of Bell’s inequalities.

The setup envisioned here consists of pairs of spin-1/2 particles produced in singlet states that then each pass through separate Stern-Gerlach (SG) devices. Since they are in singlet states, if we measured the first particle of a pair to be aligned with a given axis, say a\vec{a}, then the second should be measured to be anti-aligned with that same axis, giving a total spin of zero.

In practice we are dealing with beams of particles and thus we can never be absolutely certain that correlated pairs are measured simultaneously and so we ultimately are making statistical predictions. Nevertheless, in a given sample consisting of a large-enough number of randomly distributed spin-1/2 particles, we can be certain that, for example, a definite number are aligned with an axis a\vec{a} while a definite number are aligned with an axis b\vec{b}.

Now take an individual particle and suppose that, for this particle, if we measured σa\sigma\cdot\vec{a} we would obtain a +1 with certainty (meaning it is aligned with a\vec{a}) but if we instead chose to measure σb\sigma\cdot\vec{b} we would obtain a -1 with certainty (meaning it is anti-aligned with b\vec{b}). Notationally we refer to such a particle as belonging to type (a+,b)(\vec{a}+,\vec{b}-). Clearly for a given pair of particles in a singlet state, if particle 1 is of type (a+,b)(\vec{a}+,\vec{b}-), then particle 2 must be of type (a,b+)(\vec{a}-,\vec{b}+).


For beams of correlated particles measuring along only two axes, we should expect to get a roughly evenly balanced distribution of types as follows:

Particle 1 Particle 2 (a+,b) (a,b+) (a+,b+) (a,b) (a,b) (a+,b+) (a,b+) (a+,b) \array{ \text{ Particle 1 } & & \text{ Particle 2 } \\ (\vec{a}+,\vec{b}-) & \leftrightarrow & (\vec{a}-,\vec{b}+) \\ (\vec{a}+,\vec{b}+) & \leftrightarrow & (\vec{a}-,\vec{b}-) \\ (\vec{a}-,\vec{b}-) & \leftrightarrow & (\vec{a}+,\vec{b}+) \\ (\vec{a}-,\vec{b}+) & \leftrightarrow & (\vec{a}+,\vec{b}-) }

There is a very important assumption implied here. Suppose a particular pair belongs to the first grouping, that is if an observer A decides to measure the spin along a\vec{a} for particle 1, he or she necessarily obtains a plus sign (corresponding to it being aligned with a\vec{a}) regardless of any measurement observer B may make on particle 2. This is the principle of locality: A’s result is predetermined independently of B’s choice of what to measure.

Wigner’s derivation

Now suppose we introduce a third axis, c\vec{c}, so that we can have, for example, particles of type (a+,b+,c)(\vec{a}+,\vec{b}+,\vec{c}-) corresponding to being aligned if measured on a\vec{a} and b\vec{b} and anti-aligned on c\vec{c}. Further let us “count” the pairs that fall into the various groupings and label the populations as follows:

Population Particle 1 Particle 2 N 1 (a+,b+,c+) (a,b,c) N 2 (a+,b+,c) (a,b,c+) N 3 (a+,b,c+) (a,b+,c) N 4 (a+,b,c) (a,b+,c+) N 5 (a,b+,c+) (a+,b,c) N 6 (a,b+,c) (a+,b,c+) N 7 (a,b,c+) (a+,b+,c) N 8 (a,b,c) (a+,b+,c+) \array{ \text{ Population } & \text{ Particle 1 } & \text{ Particle 2 } \\ N_{1} & (\vec{a}+,\vec{b}+, \vec{c}+) & (\vec{a}-,\vec{b}-,\vec{c}-) \\ N_{2} & (\vec{a}+,\vec{b}+, \vec{c}-) & (\vec{a}-,\vec{b}-,\vec{c}+) \\ N_{3} & (\vec{a}+,\vec{b}-, \vec{c}+) & (\vec{a}-,\vec{b}+,\vec{c}-) \\ N_{4} & (\vec{a}+,\vec{b}-, \vec{c}-) & (\vec{a}-,\vec{b}+,\vec{c}+) \\ N_{5} & (\vec{a}-,\vec{b}+, \vec{c}+) & (\vec{a}+,\vec{b}-,\vec{c}-) \\ N_{6} & (\vec{a}-,\vec{b}+, \vec{c}-) & (\vec{a}+,\vec{b}-,\vec{c}+) \\ N_{7} & (\vec{a}-,\vec{b}-, \vec{c}+) & (\vec{a}+,\vec{b}+,\vec{c}-) \\ N_{8} & (\vec{a}-,\vec{b}-, \vec{c}-) & (\vec{a}+,\vec{b}+,\vec{c}+) }

Let’s suppose that observer A finds particle 1 is aligned with a\vec{a}, i.e. a+\vec{a}+, and that observer B finds particle 2 is aligned with b\vec{b}, i.e. b+\vec{b}+. From the above table it is clear that the pair belong to either population 3 or 4. Note that because N iN_{i} is positive semi-definite we must be able to construct relations like, for instance,

(3)N 3+N 4(N 3+N 7)+(N 4+N 2). N_{3} + N_{4} \le (N_{3} + N_{7}) + (N_{4} + N_{2}).

Now let P(a+;b+)P(\vec{a}+;\vec{b}+) be the probability that, in a random selection, A finds particle 1 to be a+\vec{a}+ and B finds particle 2 to be b+\vec{b}+. In terms of populations, we have

(4)P(a+;b+)=(N 3+N 4) i 8N i. P(\vec{a}+;\vec{b}+) = \frac{(N_{3} + N_{4})}{\sum_{i}^{8}N_{i}}.

Similarly we have

(5)P(a+;c+)=(N 2+N 4) i 8N i P(\vec{a}+;\vec{c}+) = \frac{(N_{2} + N_{4})}{\sum_{i}^{8}N_{i}}


(6)P(c+;b+)=(N 3+N 7) i 8N i. P(\vec{c}+;\vec{b}+) = \frac{(N_{3} + N_{7})}{\sum_{i}^{8}N_{i}}.

The positivity condition (3) then becomes

(7)P(a+;b+)P(a+;c+)+P(c+;b+). P(\vec{a}+;\vec{b}+) \le P(\vec{a}+;\vec{c}+) + P(\vec{c}+;\vec{b}+).

This is Wigner’s form of Bell’s inequality.

Violations and geometry

As we mentioned before, we have used purely classical reasoning to derive the two forms of Bell’s inequality that we have thusfar encountered. Recall that the context within which the above were derived was the Stern-Gerlach experiment are we are measuring along axes of the magnetic field. As such, there are angles between these various axes. Thus the quantum mechanically-derived probabilities corresponding to (4), (5), and (6) are

P(a+;b+)=12sin 2(θ ab2), P(\vec{a}+;\vec{b}+) = \frac{1}{2}sin^{2}\left(\frac{\theta_{ab}}{2}\right),
P(a+;c+)=12sin 2(θ ac2), P(\vec{a}+;\vec{c}+) = \frac{1}{2}sin^{2}\left(\frac{\theta_{ac}}{2}\right),


P(c+;b+)=12sin 2(θ cb2), P(\vec{c}+;\vec{b}+) = \frac{1}{2}sin^{2}\left(\frac{\theta_{cb}}{2}\right),

respectively. Bell’s inequality, (7), then becomes

(8)12sin 2(θ ab2)12sin 2(θ ac2)+12sin 2(θ cb2). \frac{1}{2}sin^{2}\left(\frac{\theta_{ab}}{2}\right) \le \frac{1}{2}sin^{2}\left(\frac{\theta_{ac}}{2}\right) + \frac{1}{2}sin^{2}\left(\frac{\theta_{cb}}{2}\right).

From a geometric point of view, this inequality is not always possible. For example, suppose, for simplicity that a\vec{a}, b\vec{b}, and c\vec{c} lie in a plane and suppose that c\vec{c} bisects a\vec{a} and b\vec{b}, i.e.

θ ab=2θ and θ ac=θ cb=θ. \array{ \theta_{ab} = 2\theta & \text{ and } & \theta_{ac}=\theta_{cb}=\theta. }

Then (8) is violated for 0<θ<π20 \lt \theta \lt \frac{\pi}{2}. For example, if θ=π4\theta = \frac{\pi}{4}, (8) would become 0.5000.2920.500 \le 0.292 which is absurd!

Compact reformulation

A transparent and compact way to derive the actual inequality of Bell 1964 (adjusting the original argument only slightly for mathematical elegance) is reviewed in Khrennikov 2008, §10.1, which we broadly follow:



  1. a probability space (Λ,dρ)(\Lambda, d\rho) with

  2. three random variables taking values in {±1}\{\pm 1\} (regarded inside the real numbers):

    (9)S i:X{±1},i{1,2,3} S_i \;\colon\; X \longrightarrow \{\pm 1\} \hookrightarrow \mathbb{R} \,, \;\;\;\; i\,\in\, \{1,2,3\}

then the correlation functions

(10)S iS j ΛS i(λ)S j(λ)dρ(λ) \langle S_{i} \, S_j\rangle \;\coloneqq\; \int_{\Lambda} \; S_i(\lambda) \, S_j(\lambda) \; d\rho(\lambda)

satisfy this inequality:

(11)|S 1S 2S 3S 2|1S 1S 3. \big\vert \langle S_1 S_2\rangle - \langle S_3 S_2\rangle \big\vert \;\leq\; 1 - \langle S_1 S_3\rangle \,.

(where ||\left\vert-\right\vert denotes the absolute value)


Recall that the expectation value of a random variable P:ΛP \,\colon\, \Lambda \longrightarrow \mathbb{R} is given by its Lebesgue integral against the probability measure:

P ΛP(λ)dρ(λ), \langle P \rangle \;\coloneqq\; \int_\Lambda P(\lambda) \, d\rho(\lambda) \,,

and that dρd\rho being a probability measure implies the normalization

(12)1 Λ1dρ(λ)=1. \langle 1 \rangle \;\equiv\; \int_\Lambda 1 \, d\rho(\lambda) \;=\; 1 \,.

Moreover, the assumption (9) that the random variables S iS_i take values in {±1}\{\pm 1\} immediately implies for all i,jin{1,2,3}i,j \,in\, \{1,2,3\} that

(13)(S iS i)=1,i.e.λΛS i(λ)S i(λ)=(±1) 2=1. \big( S_i \cdot S_i \big) \,=\, 1 \,, \;\;\;\; \text{i.e.} \;\;\; \underset{\lambda \,\in\, \Lambda}{\forall} S_i(\lambda) \, S_i(\lambda) \,=\, (\pm 1)^2 \,=\, 1 \,.

Together this implies – by repeatedly using the Cauchy-Schwarz inequality – the bounds:

|S i|1,|S iS j|1 \big\vert \langle S_i \rangle \big\vert \;\leq\; 1 \,, \;\;\;\;\;\;\; \big\vert \langle S_i S_j\rangle \big\vert \;\leq\; 1

and thus, in particular:

(14)|PS iS j||P|, \big\vert \langle P \, S_i \, S_j \rangle \big\vert \;\leq\; \big\vert \langle P \rangle \big\vert \,,

for any random variable P:ΛP \,\colon\, \Lambda \to \mathbb{R}.

Using these (evident) ingredients, we directly compute as follows

|S 1S 2S 3S 2| =| ΛS 1(λ)S 2(λ)dρ(λ) ΛS 3(λ)S 2(λ)dρ(λ)| by(10) =| Λ(S 1(λ)S 3(λ))S 2(λ)dρ(λ)| by linearity of the integral =| Λ(1S 1(λ)S 3(λ))S 1(λ)S 2(λ)dρ(λ)| by(13) | Λ(1S 1(λ)S 3(λ))dρ(λ)| by(14) =1S 1S 3 by(12)and(10) \begin{array}{ll} \big\vert \langle S_1 S_2\rangle - \langle S_3 S_2\rangle \big\vert & \\ \;=\; \Big\vert \int_{\Lambda} S_1(\lambda) \, S_2(\lambda) \, d\rho(\lambda) - \int_{\Lambda} S_3(\lambda) \, S_2(\lambda) \, d\rho(\lambda) \Big\vert & \text{by}\;\text{(10)} \\ \;=\; \Big\vert \int_{\Lambda} \big( S_1(\lambda) - S_3(\lambda) \big) \, S_2(\lambda) \, d\rho(\lambda) \Big\vert & \text{by linearity of the integral} \\ \;=\; \Big\vert \int_{\Lambda} \big( 1 - S_1(\lambda) \, S_3(\lambda) \big) S_1(\lambda) \, S_2(\lambda) \, d\rho(\lambda) \Big\vert & \text{by}\;\text{(13)} \\ \;\leq\; \Big\vert \int_{\Lambda} \big( 1 - S_1(\lambda) \, S_3(\lambda) \big) \, d\rho(\lambda) \Big\vert & \text{by}\;\text{(14)} \\ \;=\; 1 - \langle S_1 S_3\rangle & \text{by}\;\text{(12)}\;\text{and}\; \text{(10)} \end{array}

This is the inequality (11).

Other theorems about the foundations and interpretation of quantum mechanics include:



The original article:


and on a background of quantum logic:

Further on experimental verification:

Relation to the Kochen-Specker theorem:

See also:

In relation to the Grothendieck inequality:

In quantum field theory

In the generality of quantum field theory:

On Bell inequalities in particle physics and possible relation to the weak gravity conjecture:

  • Aninda Sinha, Ahmadullah Zahed, Bell inequalities in 2-2 scattering [arXiv:2212.10213]

On BRST invariant Bell inequality in gauge field theory:

  • David Dudal, Philipe De Fabritiis, Marcelo S. Guimaraes, Giovani Peruzzo, Silvio P. Sorella: BRST invariant formulation of the Bell-CHSH inequality in gauge field theories [arXiv:2304.01028]

Probabilistic opposition

Identification of Bell’s inequalities with much older inequalities in classical probability theory, due to George Boole‘s The Laws of Thought, was pointed out by (among others, called the “probabilistic opposition” in Khrennikov 2007, p. 3) by:

reviewed in:

Last revised on August 3, 2023 at 12:47:12. See the history of this page for a list of all contributions to it.