nLab
entropy

Entropy

Idea

Entropy is a measure of disorder, given by the amount of information necessary to precisely specify the state of a system.

Entropy is important in information theory and statistical physics.

Mathematical definitions

We can give a precise mathematical definition of the entropy in probability theory.

Expected information of verification

This is just a preliminary definition.

The expected information of verification of a probability (a real number in the unit interval) pp is

h(p)plogp h(p) \coloneqq - p \log p

(or more precisely h(p)lim qp(qlogq)h(p) \coloneqq \lim_{q \searrow p} (- q \log q), so that h(0)=0h(0) = 0 is defined). Notice that, despite the minus sign in this formula, hh is a nonnegative function (since logp0\log p \leq 0 for p1p \leq 1). It is also important that hh is concave.

Both h(0)h(0) and h(1)h(1) are 00, but for different reasons; h(1)=0h(1) = 0 because, upon verifying a statement with probability 11, one gains no information; while h(0)=0h(0) = 0 because one expects never to verify a statement with probability 00. In general, logp-\log p is the information gained by verifying a statement of probability pp, but this will happen only with probability pp, hence plogp-p \log p.

We have not specified the base of the logarithm, which amounts to a constant factor (proportional to the logarithm of the base), which we think of as specifying the unit of measurement of entropy. Common choices for the base are 22 (whose unit is the bit, originally a unit of memory in computer science), 256256 (byte: 88 bits), 33 (trit), e\mathrm{e} (nat or neper), 1010 (bel, originally a unit of power intensity in telegraphy, or ban, dit, or hartley), and 1010\root{10}{10} (decibel: 1/101/10 of a bel). In applications to statistical physics, common bases are approximately 10 3.1456×10 2210^{3.1456 \times 10^{22}} (joule per kelvin), 1.654041.65404 (calorie per mole-kelvin), etc.

Entropy of a σ\sigma-algebra on a probability space

This is a general mathematical definition of entropy.

Given a probability measure space (X,μ)(X,\mu) and a σ\sigma-algebra \mathcal{M} of measurable sets in XX, the entropy of \mathcal{M} with respect to μ\mu is

(1)H μ()sup{ Ah(μ(A))|,||< 0,X=}. H_\mu(\mathcal{M}) \coloneqq \sup \{ \sum_{A \in \mathcal{F}} h(\mu(A)) \;|\; \mathcal{F} \subseteq \mathcal{M},\; {|\mathcal{F}|} \lt \aleph_0,\; X = \biguplus \mathcal{F} \} .

In words, the entropy is the supremum, over all ways of expressing XX as an internal disjoint union of finitely many elements of the σ\sigma-algebra \mathcal{M}, of the sum, over these measurable sets, of the expected information of verification of these sets. This supremum can also be expressed as a limit as we take \mathcal{F} to be finer and finer, since hh is concave and the partitions are directed.

(Without loss of generality, we do not need the elements of \mathcal{F} to be disjoint, as long as their intersections are null sets. Similarly, we do not need their union to be all of XX, as long as their union is a full set. In constructive mathematics, it seems that we must weaken the latter condition in this way.)

This definition is very general, and it is instructive to look at special cases.

Entropy of a probability space

Given a probability space (X,μ)(X,\mu), the entropy of this probability space is the entropy, with respect to μ\mu, of the σ\sigma-algebra of all measurable subsets of XX.

Entropy of a partition of a probability space

Recall that a partition of a set XX is a family 𝒫\mathcal{P} of XX such that XX is the union of 𝒫\mathcal{P} and any two distinct elements of 𝒫\mathcal{P} are disjoint. (That is, the supremum in (1) is taken over finite partitions of XX into elements of \mathcal{M}.)

Every partition of a measure space XX into measurable sets (indeed, any family of measurable subsets of XX) generates a σ\sigma-algebra of measurable sets. The entropy of a measurable partition 𝒫\mathcal{P} of a probability measure space (X,μ)(X,\mu) is the entropy, with respect to μ\mu, of the σ\sigma-algebra generated by 𝒫\mathcal{P}. The formula (1) may then be written

(2)H μ(𝒫)= A𝒫h(μ(A))= A𝒫μ(A)logμ(A), H_\mu(\mathcal{P}) = \sum_{A \in \mathcal{P}} h(\mu(A)) = - \sum_{A \in \mathcal{P}} \mu(A) \log \mu(A) ,

since an infinite sum (of positive terms) may also be defined as a supremum. (Actually, the supremum in the infinite sum does not quite match the supremum in (1), so there is a bit of a theorem to prove here.)

In most of the following special cases, we will consider only partitions, although it would be possible also to consider more general σ\sigma-algebras.

Entropy of (a partition of) a discrete probability space

Recall that a discrete probability space is a set XX equipped with a function μ:X[0,1]\mu\colon X \to [0,1] such that iXμ(i)=1\sum_{i \in X} \mu(i) = 1; since μ(i)>0\mu(i) \gt 0 is possible for only countably many ii, we may assume that XX is countable. We make XX into a measure space (with every subset measurable) by defining μ(A) iAμ(i)\mu(A) \coloneqq \sum_{i \in A} \mu(i). Since every set is measurable, any partition of XX is a partition into measurable sets.

Given a discrete probability space (X,μ)(X,\mu) and a partition 𝒫\mathcal{P} of XX, the entropy of 𝒫\mathcal{P} with respect to μ\mu is defined to be the entropy of 𝒫\mathcal{P} with respect to the probability measure induced by μ\mu. Simplifying (2), we find

H μ(𝒫)= A𝒫h( iAμ(i))= A𝒫( iAμ(i))log( iAμ(i)). H_\mu(\mathcal{P}) = \sum_{A \in \mathcal{P}} h(\sum_{i \in A} \mu(i)) = - \sum_{A \in \mathcal{P}} (\sum_{i \in A} \mu(i)) \log (\sum_{i \in A} \mu(i)) .

More specially, the entropy of the discrete probability space (X,μ)(X,\mu) is the entropy of the partition of XX into singletons; we find

H μ(X)= iXh(μ(i))= iXμ(i)logμ(i). H_\mu(X) = \sum_{i \in X} h(\mu(i)) = - \sum_{i \in X} \mu(i) \log \mu(i) .

This is actually a special case of the entropy of a probability space, since the σ\sigma-algebra generated by the singletons is the power set of XX (at least when XX is countable, and the formulas agree in any case).

Yet more specially, the entropy of a finite set XX is the entropy of XX equipped with the uniform discrete probability measure; we find

(3)H unif(X)= iX1|X|log1|X|=log|X|, H_{unif}(X) = - \sum_{i \in X} \frac{1}{|X|} \log \frac{1}{|X|} = \log {|X|} ,

which is probably the earliest mathematical formula for entropy, due to Boltzmann. (Its physical interpretation appears below.)

Entropy with respect to an absolutely continuous probability measure on the real line

Recall that a Borel measure? μ\mu on an interval XX in the real line is absolutely continuous if μ(A)=0\mu(A) = 0 whenever AA is a null set (with respect to Lebesgue measure). In this case, we can take the Radon-Nikodym derivative of μ\mu with respect to Lebesgue measure, to get an integrable function ff, called the probability distribution function; we recover μ\mu by

(4)μ(A)= Af(x)dx, \mu(A) = \int_A f(x) \mathrm{d}x ,

where the integral is taken with respect to Lebesgue measure.

If 𝒫\mathcal{P} is a partition of an interval XX into Borel sets, then the entropy of 𝒫\mathcal{P} with respect to an integrable function ff is the entropy of 𝒫\mathcal{P} with respect to the measure induced by ff using the integral formula (4); we find

H f(𝒫)= A𝒫h( Af(x)dx)= A𝒫( Af(x)dx)log( Af(x)dx). H_f(\mathcal{P}) = \sum_{A \in \mathcal{P}} h(\int_A f(x) \mathrm{d}x) = - \sum_{A \in \mathcal{P}} (\int_A f(x) \mathrm{d}x) \log (\int_A f(x) \mathrm{d}x) .

On the other hand, the entropy of the probability distribution space (X,f)(X,f) is the entropy of the entire σ\sigma-algebra of all Borel sets with respect to ff; we find

H f(X)=f(x)logf(x)dx H_f(X) = - \int f(x) \log f(x) \mathrm{d}x

by a fairly complicated argument.

I haven't actually managed to check this argument yet, although my memory tags it as a true fact. —Toby

Note that this σ\sigma-algebra is not generated by a partition.

Entropy of a density matrix

In the analogy between classical physics and quantum physics, we move from probability distributions on a phase space to density operators on a Hilbert space.

So just as the entropy of a probability distribution ff is given by flogf- \int f \log f, so the entropy of a density operator ρ\rho is

H ρTr(ρlogρ), H_\rho \coloneqq - Tr (\rho \log \rho) ,

using the functional calculus.

These are both special cases of the entropy of a state on a C *C^*-algebra.

There is a way to fit this into the framework given by (1), but I don't remember it (and never really understood it).

Relative entropy

For two finite probability distributions (p i)(p_i) and (q i)(q_i), their relative entropy is

S(p/q) k=1 np k(logp klogq k). S(p/q) \coloneqq \sum_{k = 1}^n p_k(log p_k - log q_k) \,.

Or alternatively, for ρ,ϕ\rho, \phi two density matrices, their relative entropy is

S(ρ/ϕ)trρ(logρlogϕ). S(\rho/\phi) \coloneqq tr \rho(log \rho - log \phi) \,.

There is a generalization of these definitions to states on general von Neumann algebras, due to (Araki).

For more on this see relative entropy.

Physical entropy

As hinted above, any probability distribution on a phase space in classical physics has an entropy, and any density matrix on a Hilbert space in quantum physics has an entropy. However, these are microscopic entropy, which is not the usual entropy in thermodynamics and most other branches of physics. (In particular, microscopic entropy is conserved, rather than increasing with time.)

Instead, physicists use coarse-grained entropy, which corresponds mathematically to taking the entropy of a σ\sigma-algebra much smaller than the σ\sigma-algebra of all measurable sets. Given a classical system with NN microscopic degrees of freedom, we identify nn macroscopic degrees of freedom that we can reasonably expect to measure, giving a map from N\mathbb{R}^N to n\mathbb{R}^n (or more generally, a map from an NN-dimensional microscopic phase space to an nn-dimensional macroscopic phase space). Then the σ\sigma-algebra of all measurable sets in n\mathbb{R}^n pulls back to a σ\sigma-algebra on N\mathbb{R}^N, and the macroscopic entropy of a statistical state is the entropy of this σ\sigma-algebra. (Typically, NN is on the order of Avogadro's number, while nn is rarely more than half a dozen, and often as small as 22.)

Generally, we specify a state by a point in n\mathbb{R}^n, a macroscopic pure state, and assume a uniform probability distribution on its fibre in N\mathbb{R}^N. If this fibre were a finite set, then we would recover Boltzmann's formula (3). This is never exactly true in classical statistical physics, but it is often nevertheless a very good approximation. (Boltzmann's formula actually makes better physical sense in quantum statistical physics, even though Boltzmann himself did not live to see this.)

Gravitational entropy

References

General

The concept of entropy was introduced, by Rudolf Clausius in 1865, in the context of physics, and then adapted to information theory by Claude Shannon in 1948, to quantum mechanics by John von Neumann in 1955, to ergodic theory by Andrey Kolmogorov and Sinai in 1958, and to topological dynamics by Adler, Konheim and McAndrew in 1965.

Relative entropy of states on von Neumann algebras was introduced in

A survey of entropy in operator algebras is in

  • Erling Størmer, Entropy in operator algebras (pdf)

See also

  • A. P. Balachandran, T. R. Govindarajan, Amilcar R. de Queiroz, A. F. Reyes-Lega, Algebraic Approach to Entanglement and Entropy (arXiv:1301.1300)

A large collection of references on quantum entropy is in

  • Christopher Fuchs, References for Research in Quantum Distinguishability and State Disturbance (pdf)

A discussion of entropy with an eye towards the presheaf topos over the site of finite measure spaces is in

  • Mikhail Gromov, In a Search for a Structure, Part I: On Entropy (2012) (pdf)

  • William Lawvere, State categories, closed categories, and the existence (subtitle: Semi-continuous entropy functions), IMA reprint 86, pdf

Axiomatic characterizations

After the concept of entropy proved enormously useful in practice, many people have tried to find a more abstract foundation for the concept (and its variants) by characterizing it as the unique measure satisfying some list of plausible-sounding axioms.

A characterization of relative entropy on finite-dimensional C-star algebras is given in

  • D. Petz, Characterization of the relative entropy of states of matrix algebras, Acta Math. Hung. 59 (3-4) (1992) (pdf)

A simple characterization of von Neumann entropy of density matrices (mixed quantum states) is discussed in

  • Bernhard Baumgartner, Characterizing Entropy in Statistical Physics and in Quantum Information Theory, arXiv:1206.5727

Entropy-like quantities appear in the study of many PDEs, with entropy estimates. For an intro see

  • L. C. Evans, A survey of entropy methods for partial differential equations, pdf; (and longer course text:) Entropy and partial differential equations, pdf
Revised on December 10, 2014 23:12:07 by Urs Schreiber (87.86.184.66)