John Baez A characterization of entropy in terms of information loss

Redirected from "A characterization of Shannon entropy".

Idea

This is a draft of a paper by John Baez, Tobias Fritz and Tom Leinster. The finished version can be found here:

  • John Baez, Tobias Fritz and Tom Leinster, A characterization of entropy in terms of information loss, on the arXiv or free online at Entropy 13 (2011), 1945-1957.

The paper was developed in a series of blog conversations, in this order:

For a lot of material that never got incorporated into the paper, see

and for another paper that never got finished, see:

Contents

Abstract

There are numerous characterizations of Shannon entropy and Tsallis entropy as measures of information obeying certain properties. Using work by Faddeev and Furuichi, we derive a very simple characterization. Instead of focusing on the entropy of a probability measure on a finite set, this characterization focuses on the ‘information loss’, or change in entropy, associated with a measure-preserving function. We show that Shannon entropy gives the only concept of information loss that is functorial, convex-linear and continuous. This characterization naturally generalizes to Tsallis entropy as well.

Introduction

The Shannon entropy (S) of a probability measure pp on a finite set XX is given by:

H(p)= iXp iln(p i). H(p) = - \sum_{i \in X} p_i \, \ln(p_i) .

There are many theorems that seek to characterize Shannon entropy starting from plausible assumptions; see for example the book by Aczél and Daróczy (AD). Here we give a new and very simple characterization theorem. The main novelty is that we do not focus directly on the entropy of a single probability measure, but rather, on the change in entropy associated with a measure-preserving function. The entropy of a single probability measure can be recovered as the change in entropy of the unique measure-preserving function to the one-point space.

A measure-preserving function can map several points to the same point, but not vice versa, so this change in entropy is always a decrease. Since the second law of thermodynamics says that entropy always increases, this may seem counterintuitive. It may seem less so if we think of the function as some kind of data processing which does not introduce any additional randomness. Then the entropy can only decrease, and we can talk about the ‘information loss’ associated to the function.

Some examples may help to clarify this point. Consider the only possible map f:{a,b}{c}f: \{a,b\} \to \{c\}. Suppose pp is the probability measure on {a,b}\{a,b\} such that each point has measure 1/21/2, while qq is the unique probability measure on the set {c}\{c\}. Then H(p)=ln2H(p) = \ln 2, while H(q)=0H(q) = 0. The information loss associated to the map ff is defined to be H(p)H(q)H(p) - H(q), which in this case equals ln2\ln 2. In other words, the measure-preserving map ff loses one bit of information.

On the other hand, suppose pp' is the probability measure on {a,b}\{a,b\} such that aa has measure 11 and bb has measure 00. Then H(p)=0H(p') = 0, so with respect to this probability measure the map ff has information loss H(p)H(q)=0H(p') - H(q) = 0. It may seem odd to say that ff loses no information: after all, it maps aa and bb to the the same point. However, because the point bb has probability zero with respect to pp', knowing that f(x)=cf(x) = c lets us conclude that x=ax = a with probability one.

The shift in emphasis from probability measures to measure-preserving functions suggests that it will be useful to adopt the language of category theory (M), where one has objects and morphisms between them. We will do this, although almost no category theory is required to read this paper.

Shannon entropy has a very simple characterization in terms of information loss. To state it, we consider a category where a morphism f:pqf: p \to q is a measure-preserving function between finite sets equipped with probability measures. We assume FF is a function that assigns to any such morphism a number F(f)[0,)F(f) \in [0,\infty), which we call its information loss. We also assume that FF obeys three axioms. If we call a morphism a ‘process’, we can state these roughly in words as follows:

  1. Functoriality. Given a process consisting of two stages, the amount of information lost in the whole process is the sum of the amounts lost at each stage:

    F(fg)=F(f)+F(g).F(f \circ g) = F(f) + F(g) .
  2. Convex linearity. If we flip a probability-λ\lambda coin to decide whether to do one process or another, the information lost is λ\lambda times the information lost by the first process plus (1λ)(1 - \lambda) times the information lost by the second:

    F(λf(1λ)g)=λF(f)+(1λ)F(g). F(\lambda f \oplus (1 - \lambda) g) = \lambda F(f) + (1 - \lambda) F(g).
  3. Continuity. If we change a process slightly, the information lost changes only slightly: F(f)F(f) is a continuous function of ff.

(For full details see Section 2.) Given these assumptions, we conclude that there exists a constant c0c \ge 0 such that for any f:pqf: p \to q, we have

F(f)=c(H(p)H(q)). F(f) = c(H(p) - H(q)) .

The charm of this result is that none of the hypotheses hint at any special role for the function plnp- p \ln p, but it emerges in the conclusion. The key here is a result of Faddeev (F) described in Section 3.

For many scientific purposes, probability measures are not enough. Our result extends to general measures on finite sets, as follows. Any measure on a finite set can be expressed as λp\lambda p for some scalar λ\lambda and probability measure pp, and we define H(λp)=λH(p)H(\lambda p) = \lambda H(p). In this more general setting, we are no longer confined to taking convex linear combinations of measures. Accordingly, the convex linearity condition in our main theorem is replaced by two conditions: additivity (F(fg)=F(f)+F(g)F(f \oplus g) = F(f) + F(g)) and homogeneity (F(λf)=λF(f)F(\lambda f) = \lambda F(f)). As before, the conclusion is that, up to a multiplicative constant, FF assigns to each morphism f:pqf: p \to q the information loss H(p)H(q)H(p) - H(q).

It is natural to wonder what happens when we replace the homogeneity axiom F(λf)=λF(f)F(\lambda f) = \lambda F(f) by a more general homogeneity condition:

F(λf)=λ αF(f)F(\lambda f) = \lambda^\alpha\, F(f)

for some number α>0\alpha \gt 0. In this case we find that F(f)F(f) is proportional to H α(p)H α(q)H_\alpha(p) - H_\alpha(q), where H αH_\alpha is the so-called Tsallis entropy of order α\alpha.

The main result

We work with finite sets equipped with probability measures. All measures on a finite set XX will be assumed nonnegative and defined on the σ\sigma-algebra of all subsets of XX.

Definition

Let FinProb\Fin\Prob be the category whose objects are finite sets equipped with probability measures and whose morphisms are measure-preserving functions.

Since any measure on a finite set is determined by its values on singletons, we will think of an object of FinProb\Fin\Prob as a pair (X,p)(X,p) consisting of a finite set XX together with an XX-tuple of numbers p i[0,1]p_i \in [0,1] satisfying p i=1\sum p_i = 1. A morphism f:(X,p)(Y,q)f: (X,p) \to (Y,q) in FinProb\Fin\Prob is a function f:XYf: X \to Y such that

q j= if 1(j)p i. q_j = \sum_{i \in f^{-1}(j)} p_i .

We will usually write an object (X,p)(X,p) as pp for short, and write a morphism f:(X,p)(Y,q)f: (X,p) \to (Y,q) as simply f:pqf: p \to q.

There is a way to take ‘convex linear combinations’ of objects and morphisms in FinProb\Fin\Prob. Let (X,p)(X, p) and (Y,q)(Y, q) be finite sets equipped with probability measures, and let λ[0,1]\lambda \in [0, 1]. Then there is a probability measure

λp(1λ)q \lambda p \oplus (1 - \lambda) q

on the disjoint union of the sets XX and YY, whose value at a point kk is given by

(λp(1λ)q) k={λp k ifkX (1λ)q k ifkY. (\lambda p \oplus (1 - \lambda) q)_k = \begin{cases} \lambda p_k &if k \in X\\ (1 - \lambda) q_k &if k \in Y. \end{cases}

Given morphisms f:ppf: p \to p' and g:qqg: q \to q', there is a unique morphism

λf(1λ)g:λp(1λ)qλp(1λ)q \lambda f \oplus (1 - \lambda) g: \lambda p \oplus (1 - \lambda) q \to \lambda p' \oplus (1 - \lambda) q'

that restricts to ff on the measure space pp and gg on the measure space qq.

The same notation can be extended, in the obvious way, to convex combinations of more than two objects or morphisms. For example, given objects p(1),,p(n)p(1), \ldots, p(n) of FinProb\Fin\Prob and nonnegative scalars λ 1,,λ n\lambda_1, \ldots, \lambda_n summing to 11, there is a new object i=1 nλ ip(i)\bigoplus_{i = 1}^n \lambda_i p(i).

Recall that the Shannon entropy of a probability measure pp on a finite set XX is

H(p)= iXp iln(p i)[0,), H(p) = -\sum_{i \in X} p_i \, \ln(p_i) \in [0, \infty),

with the convention that 0ln(0)=00\,\ln(0) = 0.

Theorem

Suppose FF is any map sending morphisms in FinProb\Fin\Prob to numbers in [0,)[0,\infty) and obeying these three axioms:

  1. Functoriality:
    (1)F(fg)=F(f)+F(g) F(f \circ g) = F(f) + F(g)

    whenever f,gf,g are composable morphisms.

  2. Convex linearity:
    (2)F(λf(1λ)g)=λF(f)+(1λ)F(g) F(\lambda f \oplus (1 - \lambda)g) = \lambda F(f) + (1 - \lambda) F(g)

    for all morphisms f,gf,g and scalars λ[0,1]\lambda \in [0, 1].

  3. Continuity: FF is continuous.

Then there exists a constant c0c \ge 0 such that for any morphism f:pqf : p \to q in FinProb\Fin\Prob,

F(f)=c(H(p)H(q)) F(f) = c(H(p) - H(q))

where H(p)H(p) is the Shannon entropy of pp. Conversely, for any constant c0c \ge 0, this formula determines a map FF obeying conditions 1-3.

We need to explain condition 3. A sequence of morphisms (X n,p(n))f n(Y n,q(n)) (X_n, p(n)) \stackrel{f_n}{\to} (Y_n, q(n)) in FinProb\Fin\Prob converges to a morphism (X,p)f(Y,q) (X, p) \stackrel{f}{\to} (Y, q) if

  • for all sufficiently large nn, we have X n=XX_n = X, Y n=YY_n = Y, and f n(i)=f(i)f_n(i) = f(i) for all iXi \in X
  • p(n)pp(n) \to p and q(n)qq(n) \to q pointwise.

We define FF to be continuous if F(f n)F(f)F(f_n) \to F(f) whenever f nf_n is a sequence of morphisms converging to a morphism ff.

The proof of Theorem is given in a later section. First we show how to deduce a characterization of Shannon entropy for general measures on finite sets.

Definition

Let FinMeas\Fin\Meas be the category whose objects are finite sets equipped with measures and whose morphisms are measure-preserving functions.

There is more room for maneuver in FinMeas\Fin\Meas than in FinProb\Fin\Prob: we can take arbitrary nonnegative linear combinations of objects and morphisms, not just convex combinations. Any nonnegative linear combination can be built up from direct sums and multiplication by nonnegative scalars, defined as follows.

  • For ‘direct sums’, first note that the disjoint union of two finite sets equipped with measures is another thing of the same sort. We write the disjoint union of p,qFinMeasp, q \in \Fin\Meas as pqp \oplus q. Then, given morphisms f:ppf : p \to p', g:qqg : q \to q' there is a unique morphism fg:pqpqf \oplus g : p \oplus q \to p' \oplus q' that restricts to ff on the measure space pp and gg on the measure space qq.

  • For ‘scalar multiplication’, first note that we can multiply a measure by a nonnegative real number and get a new measure. So, given an object pFinMeasp \in \Fin\Meas and a number λ0\lambda \ge 0 we obtain an object λpFinMeas\lambda p \in \Fin\Meas with the same underlying set and with (λp) i=λp i(\lambda p)_i = \lambda p_i. Then, given a morphism f:pqf : p \to q, there is a unique morphism λf:λpλq\lambda f : \lambda p \to \lambda q that has the same underlying function as ff.

This is consistent with our earlier notation for convex linear combinations.

We wish to give some conditions guaranteeing that a map sending morphisms in FinMeas\Fin\Meas to nonnegative real numbers comes from a multiple of Shannon entropy. To do this we need to define the Shannon entropy of a finite set XX equipped with a measure pp, not necessarily a probability measure. Define the total mass of (X,p)(X, p) to be

p= iXp i. {\|p\|} = \sum_{i \in X} p_i .

If this is nonzero, then pp is of the form pp¯{\|p\|} \bar{p} for a unique probability measure space p¯\bar{p}. In this case we define the Shannon entropy of pp to be pH(p¯){\|p\|} H(\bar{p}). If the total mass of pp is zero, we define its Shannon entropy to be zero.

We can define continuity for a map sending morphisms in FinMeas\Fin\Meas to numbers in [0,)[0,\infty) just as we did for FinProb\Fin\Prob, and show:

Corollary

Suppose FF is any map sending morphisms in FinMeas\Fin\Meas to numbers in [0,)[0,\infty) and obeying these four axioms:

  1. Functoriality:
    F(fg)=F(f)+F(g) F(f \circ g) = F(f) + F(g)

    whenever f,gf,g are composable morphisms.

  2. Additivity:
    (3)F(fg)=F(f)+F(g) F(f \oplus g) = F(f) + F(g)

    for all morphisms f,gf,g.

  3. Homogeneity:
    (4)F(λf)=λF(f) F(\lambda f) = \lambda F(f)

    for all morphisms ff and all λ[0,)\lambda \in [0,\infty).

  4. Continuity: FF is continuous.

Then there exists a constant c0c \ge 0 such that for any morphism f:pqf : p \to q in FinMeas\Fin\Meas,

F(f)=c(H(p)H(q)) F(f) = c(H(p) - H(q))

where H(p)H(p) is the Shannon entropy of pp. Conversely, for any constant c0c \ge 0, this formula determines a map FF obeying conditions 1-4.

Proof

Take a map FF obeying the axioms listed here. Then FF restricts to a map on morphisms of FinProb\Fin\Prob obeying the axioms of Theorem . Hence there exists a constant c0c \ge 0 such that F(f)=c(H(p)H(q))F(f) = c(H(p) - H(q)) whenever f:pqf: p \to q is a morphism between probability measures. Now take an arbitrary morphism f:pqf: p \to q in FinMeas\Fin\Meas. Since ff is measure-preserving, p=q=λ{\|p\|} = {\|q\|} = \lambda, say. If λ0\lambda \neq 0 then p=λp¯p = \lambda \bar{p}, q=λq¯q = \lambda \bar{q} and f=λf¯f = \lambda \bar{f} for some morphism f¯:p¯q¯\bar{f}: \bar{p} \to \bar{q} in FinProb\Fin\Prob; then by homogeneity,

F(f)=λF(f¯)=λc(H(p¯)H(q¯))=c(H(p)H(q)). F(f) = \lambda F(\bar{f}) = \lambda c (H(\bar{p}) - H(\bar{q})) = c(H(p) - H(q)).

If λ=0\lambda = 0 then f=0ff = 0 f, so F(f)=0F(f) = 0 by homogeneity. So H(f)=c(H(p)H(q))H(f) = c(H(p) - H(q)) in either case. The converse statement follows from the converse in Theorem .

Why Shannon entropy works

To solidify our intuitions, we first check that F(f)=c(H(p)H(q)) F(f) = c(H(p) - H(q)) really does determine a functor obeying all the conditions of Theorem . Since all these conditions are linear in FF, it suffices to consider the case where c=1c = 1. It is clear that FF is continuous, and equation (1) is also immediate whenever g:mpg: m \to p, f:pqf: p \to q, are morphisms in FinProb\Fin\Prob:

F(fg)=H(m)H(q)=H(p)H(q)+H(m)H(p)=F(f)+F(g). F(f \circ g) = H(m) - H(q) = H(p) - H(q) + H(m) - H(p) = F(f) + F(g).

The work is to prove equation (2).

We begin by establishing a useful formula for F(f)=H(p)H(q)F(f) = H(p) - H(q), where as usual ff is a morphism pqp \to q in FinProb\Fin\Prob. Since ff is measure-preserving, we have

q j= if 1(j)p i. q_j = \sum_{i \in f^{-1}(j)} p_i.

So

jq jlnq j = j if 1(j)p ilnq j = j if 1(j)p ilnq f(i) = ip ilnq f(i) \begin{aligned} \sum_j q_j \ln q_j &=& \sum_j \sum_{i \in f^{-1}(j)} p_i \ln q_j \\ &=& \sum_j \sum_{i \in f^{-1}(j)} p_i \ln q_{f(i)} \\ &=& \sum_i p_i \ln q_{f(i)} \end{aligned}

where in the last step we note that summing over all points ii that map to jj and then summing over all jj is the same as summing over all ii. So,

F(f) = ip ilnp i+ jq jlnq j = i(p ilnp i+p ilnq f(i)) \begin{aligned} F(f) &=& - \sum_i p_i\ln p_i + \sum_j q_j \ln q_j \\ &=& \sum_i ( -p_i \ln p_i + p_i \ln q_{f(i)}) \end{aligned}

and thus

(5)F(f)= iXp ilnq f(i)p i F(f) = \sum_{i \in X} p_i \ln \frac{q_{f(i)}}{p_i}

where the quantity in the sum is defined to be zero when p i=0p_i = 0. If we think of pp and qq as the distributions of random variables xXx \in X and yYy \in Y with y=f(x)y = f(x), then F(f)F(f) is exactly the conditional entropy of xx given yy. So, what we are calling ‘information loss’ is a special case of conditional entropy.

This formulation makes it easy to check equation (2):

F(λf(1λ)g)=λF(f)+(1λ)F(g). F (\lambda f \oplus (1 - \lambda)g) = \lambda F(f) + (1 - \lambda) F(g).

In the proof of Corollary (on FinMeas\Fin\Meas), the fact that F(f)=c(H(p)H(q))F(f) = c(H(p) - H(q)) satisfies the four axioms was deduced from the analogous fact for FinProb\Fin\Prob. It can also be checked directly. For this it is helpful to note that

(6)H(p)=plnp ip iln(p i). H(p) = {\|p\|}\,\ln{\|p\|} - \sum_i p_i\,\ln(p_i).

It can then be shown that equation (5) holds for every morphism ff in FinMeas\Fin\Meas. The additivity and homogeneity axioms follow easily.

Faddeev’s theorem

To prove the hard part of Theorem , we use a characterization of entropy given by Faddeev (F) and nicely summarized at the beginning of a paper by Rényi (R). In order to state this result, it is convenient to write a probability measure on the set {1,,n}\{1, \dots, n\} as an nn-tuple p=(p 1,,p n)p = (p_1, \dots, p_n). With only mild cosmetic changes, Faddeev’s original result states:

Theorem

(Faddeev) Suppose II is a map sending any probability measure on any finite set to a nonnegative real number. Suppose that:

  1. II is invariant under bijections.

  2. II is continuous.

  3. For any probability measure pp on a set of the form {1,,n}\{1, \dots, n\}, and any number 0t10 \le t \le 1,

    (7)I((tp 1,(1t)p 1,p 2,,p n))=p 1I((t,1t))+I((p 1,,p n)) I((t p_1, (1-t)p_1, p_2, \dots, p_n)) = p_1 I((t, 1-t)) + I((p_1, \dots, p_n))

Then II is a constant nonnegative multiple of Shannon entropy.

In item 1 we are using the fact that given a bijection f:XXf: X \to X' between finite sets and a probability measure on XX, there is a unique probability measure on XX' such that pp is measure-preserving; we demand that II takes on the same value on both these probability measures. In item 2, we use the standard topology on the simplex

Δ n1={(p 1,,p n) n|p i0, ip i=1} \Delta^{n-1} = \left\{(p_1,\ldots,p_n) \in \mathbb{R}^n \:|\: p_i \geq 0, \: \sum_i p_i = 1 \right\}

to put a topology on the set of probability distributions on any nn-element set.

The most interesting and mysterious condition in Faddeev’s theorem is item 3. This is a special case of a general law appearing in the work of Shannon (S) and Faddeev (F). Namely, suppose pp is a probability measure on the set {1,,n}\{1,\dots,n \}. Suppose also that for each i{1,,n}i \in \{1, \ldots, n\}, we have a probability measure q(i)q(i) on a finite set X iX_i. Then p 1q(1)p nq(n)p_1 q(1) \oplus \cdots \oplus p_n q(n) is again a probability measure space, and the Shannon entropy of this space is given by:

H(p 1q(1)p nq(n))=H(p)+ i=1 np iH(q(i)). H\left(p_1 q(1) \oplus \cdots \oplus p_n q(n)\right) = H(p) + \sum_{i=1}^n p_i H(q(i)) .

To see this, write q ijq_{i j} for the value of q(i):X i[0,)q(i): X_i \to [0,\infty) at the point jX ij \in X_i:

H(p 1q(1)p nq(n)) = i=1 n jX ip iq ijln(p iq ij) = i=1 n jX ip i(q ijln(p(i)+q ijln(q ij)) = i=1 np i(ln(p i)+H(q(i))) = H(p)+ i=1 np iH(q(i)). \begin{aligned} H(p_1 q(1) \oplus \cdots \oplus p_n q(n)) &=& -\sum_{i=1}^n \sum_{j \in X_i} p_i q_{i j} \ln(p_i q_{i j}) \\ &=& -\sum_{i=1}^n \sum_{j \in X_i} p_i (q_{i j} \ln(p(i) + q_{i j} \ln(q_{i j})) \\ &=& \sum_{i=1}^n p_i \left( -\ln(p_i) + H(q(i)) \right) \\ &=& H(p) + \sum_{i=1}^n p_i H(q(i)). \end{aligned}

Moreover, this formula is almost equivalent to condition 3 in Faddeev’s theorem, allowing us to reformulate Faddeev’s theorem as follows:

Theorem

Suppose II is a map sending any probability measure on any finite set to a nonnegative real number. Suppose that:

  1. II is invariant under bijections.

  2. II is continuous.

  3. I((1))=0I((1)) = 0, where (1)(1) is our name for the unique probability measure on the set {1}\{1\}.

  4. For any probability measure pp on the set {1,,n}\{1,\dots, n\} and probability measures q(1),,q(n)q(1),\ldots,q(n) on finite sets, we have

    I(p 1q(1)p nq(n))=I(p)+ i=1 np iI(q(i)). I(p_1 q(1) \oplus \cdots \oplus p_n q(n)) = I(p) + \sum_{i=1}^n p_i I(q(i)) .

Then II is a constant nonnegative multiple of Shannon entropy. Conversely, any constant nonnegative multiple of Shannon entropy satisfies 1-4.

Proof

We just need to check that conditions 3 and 4 imply Faddeev’s equation (7). Take p=(p 1,,p n)p = (p_1, \ldots, p_n), q 1=(t,1t)q_1 = (t, 1 - t) and q i=(1)q_i = (1) for i2i \geq 2: then condition 4 gives

I((tp 1,(1t)p 1,p 2,,p n))=I((p 1,,p n))+p 1I((t,1t))+ i=2 np iI((1)) I((t p_1, (1 - t)p_1, p_2, \ldots, p_n)) = I((p_1, \ldots, p_n)) + p_1 I((t, 1 - t)) + \sum_{i = 2}^n p_i I((1))

which by condition 3 gives Faddeev’s equation.

It may seem miraculous how the formula

I(p 1,,p n)=c ip ilnp i I(p_1, \dots, p_n) = - c \sum_i p_i \ln p_i

emerges from the assumptions in either Faddeev’s original Theorem or the equivalent Theorem . We can demystify this by describing a key step in Faddeev’s argument, as simplified by Rényi (R). Suppose II is a function satisfying the assumptions of Faddeev’s result. Let

f(n)=I(1n,,1n) f(n) = I(\frac{1}{n} , \dots, \frac{1}{n})

be the function II applied to the uniform probability measure on an nn-element set. Since we can write a set with nmn m elements as a disjoint union of mm different nn-element sets, assumption 4 of Theorem implies that

f(nm)=f(n)+f(m). f(n m) = f(n) + f(m).

Rényi shows that the only solutions of this equation obeying

lim n(f(n+1)f(n))=0 \lim_{n \to \infty} (f(n+1) - f(n)) = 0

are

f(n)=clnn. f(n) = c \ln n .

This is how the logarithm function enters. Using condition 3 of Theorem , or equivalently conditions 3 and 4 of Theorem , the value of II can be deduced for all probability measures pp such that each p ip_i is rational. The result for arbitrary probability measures follows by continuity.

Proof of the main result

Now we complete the proof of Theorem . Assume that FF obeys conditions 1-3 in the statement of this theorem.

Recall that (1)(1) denotes the set {1}\{1\} equipped with its unique probability measure. For each object pFinProbp \in \Fin\Prob, there is a unique morphism

! p:p(1). !_p : p \to (1).

We can think of this as the map that crushes pp down to a point and loses all the information that pp had. So, we define the ‘entropy’ of the measure pp by

I(p)=F(! p). I(p) = F(!_p) .

Given any morphism f:pq f: p \to q in FinProb\Fin\Prob, we have

! p=! qf. !_p = !_q \circ f.

So, by our assumption that FF is functorial,

F(! p)=F(! q)+F(f), F(!_p) = F(!_q) + F(f),

or in other words:

(8)F(f)=I(p)I(q). F(f) = I(p) - I(q) .

To conclude the proof, it suffices to show that II is a multiple of Shannon entropy.

We do this by using Theorem . Functoriality implies that when a morphism ff is invertible, I(f)=0I(f) = 0. Together with (8), this gives condition 1 of Theorem . Since ! (1)!_{(1)} is invertible, it also gives condition 3. Condition 2 is immediate. The real work is checking condition 4.

Take a probability measure pp on {1,,n}\{1, \ldots, n\}, and probability measures q(1),,q(n)q(1), \ldots, q(n) on finite sets X 1,,X nX_1, \ldots, X_n respectively. Then we obtain a probability measure

ip iq(i) \bigoplus_i p_i q(i)

on the disjoint union of X 1,,X nX_1, \ldots, X_n. Now, we can decompose pp as a direct sum:

(9)p ip i((1)). p\cong \bigoplus_i p_i ((1)).

Define a morphism

f= ip i! q(i): ip iq(i)p i((1)). f = \bigoplus_i p_i !_{q(i)} \colon \bigoplus_i p_i q(i) \to \bigoplus p_i ((1)).

Then by convex linearity and definition of II,

F(f)= ip iF(! q(i))= ip iI(q(i)). F(f) = \sum_i p_i F(!_{q(i)}) = \sum_i p_i I(q(i)).

But also

F(f)=I( ip iq(i))I(p i((1)))=I( ip iq(i))I(p) F(f) = I\bigl(\bigoplus_i p_i q(i)\bigr) - I\bigl(\bigoplus p_i ((1))\bigr) = I\bigl(\bigoplus_i p_i q(i)\bigr) - I(p)

by (8) and (9). Comparing these two expressions for F(f)F(f) gives condition 4 of Theorem , completing the proof of Theorem .

A characterization of Tsallis entropy

Since Shannon defined his entropy in 1948, it has been generalized in many ways. Our Theorem can easily be extended to characterize one family of generalizations, the so-called ‘Tsallis entropies’. For any positive real number α\alpha, the Tsallis entropy of order α\alpha of a probability measure pp on a finite set XX is defined by:

H α(p)={1α1(1 iXp i α) ifα1 iXp ilnp i ifα=1. H_\alpha(p) = \begin{cases} \frac{1}{\alpha - 1} \Bigl( 1 - \sum_{i \in X} p_i^\alpha \Bigr) &if \alpha \neq 1 \\ - \sum_{i \in X} p_i ln p_i &if \alpha = 1. \end{cases}

The peculiarly different definition when α=1\alpha = 1 is explained by the fact that the limit lim α1H α(p)\lim_{\alpha \to 1} H_\alpha(p) exists and equals the Shannon entropy H(p)H(p).

Although these entropies are most often named after Tsallis (T), they and related quantities had been studied by others long before the 1988 paper in which Tsallis first discussed it. For example, Havrda and Charvát (HC) had already introduced a similar formula, adapted to base 2 logarithms, in a 1967 paper in information theory, and in 1982, Patil and Taillie (PT) had used H αH_\alpha itself as a measure of biological diversity.

The characterization of Tsallis entropy is exactly the same as that of Shannon entropy except in one respect: in the convex linearity condition, the degree of homogeneity changes from 11 to α\alpha.

Theorem

Let α(0,)\alpha\in(0,\infty). Suppose FF is any map sending morphisms in FinProb\Fin\Prob to numbers in [0,)[0,\infty) and obeying these three axioms:

  1. Functoriality:
    F(fg)=F(f)+F(g) F(f \circ g) = F(f) + F(g)

    whenever f,gf,g are composable morphisms.

  2. Compatibility with convex combinations:
    F(λf(1λ)g)=λ αF(f)+(1λ) αF(g) F(\lambda f \oplus (1-\lambda) g) = \lambda^{\alpha} F(f) + (1-\lambda)^{\alpha} F(g)

    for all morphisms f,gf,g and all λ[0,1]\lambda \in [0,1].

  3. Continuity: FF is continuous.

Then there exists a constant c0c \ge 0 such that for any morphism f:pqf : p \to q in FinMeas\Fin\Meas,

F(f)=c(H α(p)H α(q)) F(f) = c(H_{\alpha}(p) - H_{\alpha}(q))

where H α(p)H_{\alpha}(p) is the order α\alpha Tsallis entropy of pp. Conversely, for any constant c0c \ge 0, this formula determines a map FF obeying conditions 1-3.

Proof

The proof is exactly the same as that of Theorem , except that instead of using Faddeev’s theorem, we use Theorem V.2 of Furuichi (Fu). Furuichi’s theorem is itself the same as Faddeev’s, except that condition 3 of Theorem is replaced by

I((tp 1,(1t)p 1,p 2,,p n))=p 1 αI((t,1t))+I((p 1,,p n)) I((t p_1, (1-t)p_1, p_2, \dots, p_n)) = p_1^\alpha I((t, 1-t)) + I((p_1, \dots, p_n))

and Shannon entropy is replaced by Tsallis entropy of order α\alpha.

As in the case of Shannon entropy, this result can be extended to arbitrary measures on finite sets. For this we need to define the Tsallis entropies of an arbitrary measure on a finite set. We do so by requiring that

H α(λp)=λ αH α(p) H_\alpha(\lambda p) = \lambda^\alpha H_\alpha(p)

for all λ[0,)\lambda \in [0, \infty) and all pFinMeasp \in \Fin\Meas. When α=1\alpha = 1 this is the same as the Shannon entropy, and when α1\alpha \neq 1, it can be rewritten explicitly as

H α(p)=1α1(( iXp i) α iXp i α) H_\alpha(p) = \frac{1}{\alpha - 1} \Bigl( \Bigl( \sum_{i \in X} p_i \Bigr)^\alpha - \sum_{i \in X} p_i^\alpha \Bigr)

(which is analogous to (6)). The following result is the same as Theorem except that, again, the degree of homogeneity changes from 11 to α\alpha.

Corollary

Let α(0,)\alpha \in (0, \infty). Suppose FF is any map sending morphisms in FinMeas\Fin\Meas to numbers in [0,)[0,\infty), and obeying these four properties:

  1. Functoriality:

    F(fg)=F(f)+F(g) F(f \circ g) = F(f) + F(g)

    whenever f,gf,g are composable morphisms.

  2. Additivity:

    F(fg)=F(f)+F(g) F(f \oplus g) = F(f) + F(g)

    for all morphisms f,gf,g.

  3. Homogeneity of degree α\alpha:

    F(λf)=λ αF(f) F(\lambda f) = \lambda^\alpha F(f)

    for all morphisms ff and all λ[0,)\lambda \in [0,\infty).

  4. Continuity: FF is continuous.

Then there exists a constant c0c \ge 0 such that for any morphism f:pqf : p \to q in FinMeas\Fin\Meas,

F(f)=c(H α(p)H α(q)) F(f) = c(H_\alpha(p) - H_\alpha(q))

where H αH_\alpha is the Tsallis entropy of order α\alpha. Conversely, for any constant c0c \ge 0, this formula determines a map FF obeying conditions 1-4.

Proof

This follows from Theorem in just the same way that Corollary follows from Theorem .

Acknowledgements

We thank the denizens of the nn-Category Café, especially David Corfield, Steve Lack, Mark Meckes and Josh Shadlen, for encouragement and helpful suggestions. Leinster is supported by an EPSRC Advanced Research Fellowship.

References

  • D. K. Faddeev, On the concept of entropy of a finite probabilistic scheme, Uspehi Mat. Nauk (N.S.) 11 (1956), no. 1(67), pp. 227–231. (In Russian.) German Translation: Zum Begriff der Entropie eines endlichen Wahrscheinlichkeitsschemas, Arbeiten zur Informationstheorie I, Berlin, Deutscher Verlag der Wissenschaften, 1957, pp. 85-90.
  • J. Havrda and F. Charvát, Quantification method of classification processes: concept of structural α\alpha-entropy, Kybernetika 3 (1967), 30–35.
  • S. Mac Lane, Categories for the Working Mathematician, Graduate Texts in Mathematics 5, Springer, 1971.
  • G. P. Patil and C. Taillie, Diversity as a concept and its measurement, Journal of the American Statistical Association 77 (1982), 548–561.
  • C. Tsallis, Possible generalization of Boltzmann–Gibbs statistics, Journal of Statistical Physics 52 (1988), 479–487.

Last revised on August 15, 2019 at 19:13:32. See the history of this page for a list of all contributions to it.