John Baez Convex spaces and an operadic approach to entropy

Redirected from "Convex spaces and Shannon entropy".

Idea

This is some material that never got incorporated into a paper on entropy by John Baez, Tobias Fritz and Tom Leinster. The finished version can be found here:

  • John Baez, Tobias Fritz and Tom Leinster, A characterization of entropy in terms of information loss, arXiv or free online at Entropy 13 (2011), 1945-1957.

The paper was developed in a series of blog conversations, in this order:

and it was written up on the nLab here:

For a lot more material that never got incorporated into this paper, see:

Fadeev’s Theorem

Rényi presents a nice characterization of Shannon entropy due to Fadeev. With only some minor cosmetic changes we can state it as follows.

We may think of a probability measure on a finite set XX as a function p:X[0,1]p : X \to [0,1] such that iXp(i)=1\sum_{i \in X} p(i) = 1. The Shannon entropy of pp is

S(p)= iXp(i)lnp(i) S(p) = -\sum_{i \in X} p(i) \, \ln p(i)

It is convenient to write a probability measure on the set {1,,n}\{1, \dots, n\} as an nn-tuple p=(p 1,,p n)p = (p_1, \dots, p_n).

Theorem

(Fadeev) Suppose FF is a map sending any probability measure on any finite set to a nonnegative real number. Suppose that:

  1. FF is invariant under bijections.

  2. FF is continuous.

  3. For any probability measure pp on a set of the form {1,,n}\{1, \dots, n\}, and any number 0t10 \le t \le 1,

F(tp 1,(1t)p 1,p 2,,p n)=p 1F(t,1t)+F(p 1,,p n)F(t p_1, (1-t)p_1, p_2, \dots, p_n) = p_1 F(t, 1-t) + F(p_1, \dots, p_n)

Then FF is a constant nonnegative multiple of Shannon entropy.

In item 1 we’re using the fact that a bijection f:XXf: X \to X' between finite sets together with a probability distribution on XX gives a probability distribution on XX'; we want these to have the same entropy. In item 2 we’re using the standard topology on n\mathbb{R}^n to put a topology on the set of probability distributions on any nn-element set.

The most interesting and mysterious condition in Fadeev’s theorem is item 3. In a sense, this is what our work seeks to explain. The first step is to realize that this condition can be restated in an equivalent but conceptually clearer way. The key idea is that a probability measure on a set XX can be used to combine probability measures on an XX-indexed family of sets into a probability measure on their disjoint union. Item 3 says that Shannon entropy transforms in a simple way under this operation.

Suppose pp is a probability measure on the set {1,,n}\{1,\dots,n \}. Suppose also that for each 1in1 \le i \le n, Y iY_i is a finite set equipped with a probability measure q iq_i. Then there is a probability measure p(q 1,,q n)p \circ (q_1, \dots, q_n) on the disjoint union i=1 nY i\bigsqcup_{i=1}^n Y_i, defined as follows: if jY ij \in Y_i, then

p(q 1,,q n)=p(i)q i(j). p \circ (q_1, \dots, q_n) = p(i) q_i(j).

There is a nice formula for the Shannon entropy of this new probability measure. To express it neatly, the following notation comes in handy. Suppose pp is a probability measure on the set {1,,n}\{1,\dots,n \} and (a 1,,a n)(a_1, \dots, a_n) is an nn-tuple of real numbers. Then we write p(a 1,,a n)p (a_1, \dots, a_n) for the weighted arithmetic mean:

p(a 1,,a n)= i=1 np ia i p (a_1, \dots, a_n) = \sum_{i=1}^n p_i a_i

Now, we have:

Lemma

Suppose pp is a probability measure on the set {1,,n}\{1,\dots,n \}, and suppose that for each 1in1 \le i \le n, Y iY_i is a finite set equipped with a probability measure q iq_i. Then Shannon entropy obeys the law:

(1)S(p(q 1,,q n))=S(p)+p(S(q 1),,S(q n)) S(p\circ(q_1, \ldots, q_n)) = S(p) + p(S(q_1), \ldots, S(q_n))
Proof

The proof is a calculation. We write q ijq_{ij} for the value of q:Y i[0,)q: Y_i \to [0,\infty) at the point jY ij \in Y_i:

S(p(q 1,,q n)) = i=1 n jY ip iq ijln(p iq ij) = i=1 n jY ip i(q ijln(p(i)+q ijln(q ij)) = i=1 np(i)(ln(p i)+S(q i)) = S(p)+P(S(q 1),,S(q n)) \begin{aligned} S(p\circ(q_1, \ldots, q_n)) &=& \sum_{i=1}^n \sum_{j \in Y_i} p_i q_{ij} \ln(p_i q_{ij}) \\ &=& \sum_{i=1}^n \sum_{j \in Y_i} p_i \left( q_{ij} \ln(p(i) + q_{ij} \ln(q_{ij}) \right) \\ &=& \sum_{i=1}^n p(i) \left( \ln(p_i) + S(q_i) \right) \\ &=& S(p) + P(S(q_1), \dots, S(q_n)) \end{aligned}

Moreover, the above law is almost equivalent to item 3 in Fadeev’s theorem:

Lemma

Suppose FF is a map sending any probability measure on any finite set to a nonnegative real number. Suppose also that FF is invariant under bijections. Then the following are equivalent:

(A) For any probability measure pp on any set of the form {1,,n}\{1, \dots, n\}, and any number 0t10 \le t \le 1,

F(tp 1,(1t)p 1,p 2,,p n)=p 1F(t,1t)+F(p 1,,p n)F(t p_1, (1-t)p_1, p_2, \dots, p_n) = p_1 F(t, 1-t) + F(p_1, \dots, p_n)

and

(B1) If (1)(1) denotes the unique probability measure on the set {1}\{1\}, then F((1))=0 F((1)) = 0 .

(B2) If pp is a probability measure on the set {1,,n}\{1,\dots,n \}, and for each 1in1 \le i \le n, Y iY_i is a finite set equipped with a probability measure q iq_i, then

F(p(q 1,,q n))=F(p)+p(F(q 1),,F(q n)) F(p\circ(q_1, \ldots, q_n)) = F(p) + p(F(q_1), \ldots, F(q_n))
Proof

Assume (A). Taking n=1n = 1, p 1=1p_1 = 1 and t=1t = 1 gives

S(1,0)=S((1))+1S(1,0) S(1, 0) = S((1)) + 1S(1, 0)

from which (B1) follows immediately. Then a simple induction gives (B2).

Conversely, assume (B1) and (B2). Take p=(p 1,,p n)p = (p_1, \ldots, p_n), q 1=(t,1t)q_1 = (t, 1 - t) and q i=(1)q_i = (1) for i2i \geq 2: then (B1) gives

S(tp 1,(1t)p 1,p 2,,p n)=S(p 1,,p n)+p 1S(t,1t)+ i=2 np iS((1)) S(t p_1, (1 - t)p_1, p_2, \ldots, p_n) = S(p_1, \ldots, p_n) + p_1 S(t, 1 - t) + \sum_{i = 2}^n p_i S((1))

which by (B2) gives (A).

We henceforth call (B1,B2) the operadic property of a bijection-invariant function from probability measures on finite sets to nonnegative real numbers.

Summarizing our discussion so far, we have:

Corollary

Suppose FF is a map sending any probability measure on any finite set to a nonnegative real number. Suppose that:

  1. FF is invariant under bijections.

  2. FF is continuous.

  3. FF has the operadic property.

Then FF is a constant nonnegative multiple of Shannon entropy.

Convex spaces

A convex set is a subset XVX \subseteq V of a real vector space VV that is closed under the operations

c p(x,y)=cx+(1c)y c_p(x,y) = c x + (1-c) y

for all 0p10 \le p \le 1. We can abstract some properties of convex sets using the concept of a ‘convex space’:

Definition

A convex space is a set XX equipped with an operation c p:X×XXc_p: X \times X \to X for each number 0p10 \le p \le 1 obeying the following laws:

  • c 1(x,y)=xc_1(x,y) = x,
  • c p(x,x)=xc_p(x,x) = x,
  • c p(x,y)=c 1p(y,x)c_p(x,y) = c_{1-p}(y,x),
  • c p(c q(x,y),z)=c pq(x,c r(y,z))c_p(c_q(x,y),z) = c_{p q}(x,c_r(y,z)) for (1pq)r=(1q)p(1 - p q)r = (1 - q)p.

These operations generate an algebraic theory whose set of nn-ary operations correspond to probability measures on the set {1,,n}\{1, \dots, n\}. This set is so important that it needs a name. Of course it is often known as the (n1)(n-1)-simplex and denoted Δ n1\Delta^{n-1}, but we want a name that involves the number nn and makes us think of probability:

Definition

Let P n\mathbf{P}_n be the set of probability distributions on {1,,n}\{1, \ldots, n\}.

We can think of P n\mathbf{P}_n as the set of nn-ary operations in an operad. The idea here is that we can compose an operation pP np \in \mathbf{P}_n with a list of operations q iP k iq_i \in \mathbf{P}_{k_i} for 1in1 \le i \le n to get an operation

p(q 1,,q n)P k 1++k n p \circ (q_1, \dots, q_n) \in P_{k_1 + \cdots + k_n}

This is defined as earlier.

OLD STUFF, NOT YET INCORPORATED INTO THE PAPER YET:

Definition

The monad for convex sets is a monad on SetSet sending any set XX to the set of finitely-supported probability distributions on XX.

For example, this monad sends {1,,n}\{1, \ldots, n\} to a set P n\mathbf{P}_n, which can be identified with the (n1)(n - 1)-simplex. This monad is a finitary monad, so can be thought about in a few different ways.

A finitary monad is the same thing as a finitary algebraic theory. This one can be presented by a family (* t) t[0,1](*_t)_{t \in [0, 1]} of binary operations, subject to some equations. They’re probably exactly those equations that hold in a convex subset of n\mathbb{R}^n if we put x* ty=(1t)x+tyx *_t y = (1 - t)x + t y.

(I suspect there’s a theorem here: that for n1n \ge 1, n\mathbb{R}^n “satisfies no laws”. This means there are no equations between the various operations (x,y)(1t)x+ty(x, y) \mapsto (1 - t)x + t y beyond those forced by its being an algebra for this theory.)

A finitary algebraic theory can also be thought of as a kind of souped-up operad. In a symmetric operad PP, one has for each bijection σ:{1,,n}{1,,n}\sigma: \{1, \ldots, n\} \to \{1, \ldots, n\} an induced map σ *:P nP n\sigma_*: P_n \to P_n. In a finitary algebraic theory, one has the same thing for arbitrary functions between finite sets, not just bijections. In other words, a finitary algebraic theory amounts to a non-symmetric operad PP together with, for each function θ:{1,,m}{1,,n}\theta: \{1, \ldots, m\} \to \{1, \ldots, n\} between finite sets, an induced map θ *:P mP n\theta_*: P_m \to P_n, satisfying suitable axioms.

Definition

The underlying symmetric operad for the convex monad is called the operad for convex algebras, and denote P\mathbf{P}. An algebra of P\mathbf{P} is called a convex algebra.

The space of nn-ary operations for this operad is P n\mathbf{P}_n, the space of probability distributions on {1,,n}\{1, \ldots, n\}. The composition of operations is supposed to be obvious, but we should probably write down a formula for it. The maps “θ *:P nP m\theta_*: \mathbf{P}_n \to \mathbf{P}_m” can be defined by pushforward of measures. An algebra for the algebraic theory of convex algebras is an algebra XX for the operad with the further property that the square

P m×X n 1×θ * P m×X m θ *×1 P n×X n X \begin{matrix} \mathbf{P}_m \times X^n &\stackrel{1 \times \theta^*}{\to} &\mathbf{P}_m \times X^m \\ \theta_*\times 1 \downarrow & &\downarrow \\ \mathbf{P}_n \times X^n &\to &X \end{matrix}

commutes for all θ:{1,,m}{1,,n}\theta: \{1, \ldots, m\} \to \{1, \ldots, n\}.

Note that P\mathbf{P} is naturally a topological operad, where the topology on P n\mathbf{P}_n is the usual topology on the (n1)(n-1)-simplex. In our applications, we focus on algebras of P\mathbf{P} in topological categories with finite products. We call these convex algebras in \mathcal{E}. Here are some examples:

  • Any convex subset of n\mathbb{R}^n is naturally a convex algebra in TopTop. In particular, [0,)[0,\infty) is such.

  • Moreover, if we regard the topological monoid ([0,),+,0)([0,\infty), +, 0) as a one-object topological category, then it is a convex algebra in Cat(Top)Cat(\Top).

  • FinMeas\Fin\Meas is also a convex algebra in Cat(Top)Cat(\Top).

  • FinProb\Fin\Prob is also a convex algebra in Cat(Top)Cat(\Top).

References

  • D. K. Fadeev, Zum Begriff der Entropie eines endlichen Wahrscheinlichkeitsschemas, Arbeiten zur Informationstheorie I, Berlin, Deutscher Verlag der Wissenschaften, 1957, pp. 85-90.
  • Handbook of Analysis and its Foundations, Section 12.7 (short and to the point).

  • Romanowska, Smith, Orłowska; Abstract barycentric algebras; pdf. This generalises from [0,1][0,1] to an arbitrary LΠL \Pi-algebra (LL for ‘Łukasiewicz’, Π\Pi for ‘product’, so think of [0,1][0,1] as a space of fuzzy truth values).

  • Romanowska and Smith, Modal Theory: An Algebraic Approach to Order, Geometry, and Convexity; Res. Exp. Math. 9; Heldermann-Verlag, Berlin, 1985.

  • Marshall Harvey Stone, Postulates for the barycentric calculus. Ann. Mat. Pura. Appl. (4), 29:25–30, 1949.

  • Tobias Fritz, Convex spaces I: definition and examples arXiv:0903.5522.

  • Bart Jacobs, Duality for convexity, arXiv:0911.3834.

  • Joe Flood, Semiconvex geometry, J. Austral. Math. Soc. Ser. A 30 (1980/81), 496-–510.

  • T. Swirszcz, Monadic functors and categories of convex sets , Preprint No. 70, Proc. Inst. Math. Pol. Acad. Sci., Warsaw.

  • T. Swirszcz, Monadic functors and convexity, Bull. Acad. Polon. Sci. Ser. Sci. Math. Astronom. Phys. 22 (1974), 39–42.

  • Stanley P. Gudder, Convexity and mixtures, SIAM Review 19 (1977), 221–240.

  • Stanley P. Gudder, A general theory of convexity, Milan Journal of Mathematics, 49 (1979), 89–96.

Many other references, and a discussion of how convex spaces have been repeatedly rediscovered, can be found at the nn-Category Café post Convex Spaces.

Last revised on May 24, 2014 at 03:59:01. See the history of this page for a list of all contributions to it.