John Baez Convex spaces and an operadic approach to entropy

Idea

This is some material that never got incorporated into a paper on entropy by John Baez, Tobias Fritz and Tom Leinster. The finished version can be found here:

John Baez, Tobias Fritz and Tom Leinster, A characterization of entropy in terms of information loss, arXiv or free online at Entropy 13 (2011), 1945-1957.

The paper was developed in a series of blog conversations, in this order:

John Baez, Category-theoretic characterizations of entropy.
John Baez, Category-theoretic characterizations of entropy II.
Tom Leinster, Entropies vs. means.
Tom Leinster, An operadic introduction to entropy.
John Baez, A characterization of entropy.

and it was written up on the nLab here:

A characterization of entropy in terms of information loss.

For a lot more material that never got incorporated into this paper, see:

Entropy as a functor.

Fadeev’s Theorem

Rényi presents a nice characterization of Shannon entropy due to Fadeev. With only some minor cosmetic changes we can state it as follows.

We may think of a probability measure on a finite set $X$ as a function $p : X \to [0,1]$ such that $\sum_{i \in X} p(i) = 1$ . The Shannon entropy of $p$ is

S(p) = -\sum_{i \in X} p(i) \, \ln p(i)

It is convenient to write a probability measure on the set $\{1, \dots, n\}$ as an $n$ -tuple $p = (p_1, \dots, p_n)$ .

Theorem

(Fadeev) Suppose $F$ is a map sending any probability measure on any finite set to a nonnegative real number. Suppose that:

$F$ is invariant under bijections.
$F$ is continuous.
For any probability measure $p$ on a set of the form $\{1, \dots, n\}$ , and any number $0 \le t \le 1$ ,

F(t p_1, (1-t)p_1, p_2, \dots, p_n) = p_1 F(t, 1-t) + F(p_1, \dots, p_n)

Then $F$ is a constant nonnegative multiple of Shannon entropy.

In item 1 we’re using the fact that a bijection $f: X \to X'$ between finite sets together with a probability distribution on $X$ gives a probability distribution on $X'$ ; we want these to have the same entropy. In item 2 we’re using the standard topology on $\mathbb{R}^n$ to put a topology on the set of probability distributions on any $n$ -element set.

The most interesting and mysterious condition in Fadeev’s theorem is item 3. In a sense, this is what our work seeks to explain. The first step is to realize that this condition can be restated in an equivalent but conceptually clearer way. The key idea is that a probability measure on a set $X$ can be used to combine probability measures on an $X$ -indexed family of sets into a probability measure on their disjoint union. Item 3 says that Shannon entropy transforms in a simple way under this operation.

Suppose $p$ is a probability measure on the set $\{1,\dots,n \}$ . Suppose also that for each $1 \le i \le n$ , $Y_i$ is a finite set equipped with a probability measure $q_i$ . Then there is a probability measure $p \circ (q_1, \dots, q_n)$ on the disjoint union $\bigsqcup_{i=1}^n Y_i$ , defined as follows: if $j \in Y_i$ , then

p \circ (q_1, \dots, q_n) = p(i) q_i(j).

There is a nice formula for the Shannon entropy of this new probability measure. To express it neatly, the following notation comes in handy. Suppose $p$ is a probability measure on the set $\{1,\dots,n \}$ and $(a_1, \dots, a_n)$ is an $n$ -tuple of real numbers. Then we write $p (a_1, \dots, a_n)$ for the weighted arithmetic mean:

p (a_1, \dots, a_n) = \sum_{i=1}^n p_i a_i

Now, we have:

Lemma

Suppose $p$ is a probability measure on the set $\{1,\dots,n \}$ , and suppose that for each $1 \le i \le n$ , $Y_i$ is a finite set equipped with a probability measure $q_i$ . Then Shannon entropy obeys the law:

(1)

S(p\circ(q_1, \ldots, q_n)) = S(p) + p(S(q_1), \ldots, S(q_n))

Proof

The proof is a calculation. We write $q_{ij}$ for the value of $q: Y_i \to [0,\infty)$ at the point $j \in Y_i$ :

\begin{aligned} S(p\circ(q_1, \ldots, q_n)) &=& \sum_{i=1}^n \sum_{j \in Y_i} p_i q_{ij} \ln(p_i q_{ij}) \\ &=& \sum_{i=1}^n \sum_{j \in Y_i} p_i \left( q_{ij} \ln(p(i) + q_{ij} \ln(q_{ij}) \right) \\ &=& \sum_{i=1}^n p(i) \left( \ln(p_i) + S(q_i) \right) \\ &=& S(p) + P(S(q_1), \dots, S(q_n)) \end{aligned}

Moreover, the above law is almost equivalent to item 3 in Fadeev’s theorem:

Lemma

Suppose $F$ is a map sending any probability measure on any finite set to a nonnegative real number. Suppose also that $F$ is invariant under bijections. Then the following are equivalent:

(A) For any probability measure $p$ on any set of the form $\{1, \dots, n\}$ , and any number $0 \le t \le 1$ ,

F(t p_1, (1-t)p_1, p_2, \dots, p_n) = p_1 F(t, 1-t) + F(p_1, \dots, p_n)

and

(B1) If $(1)$ denotes the unique probability measure on the set $\{1\}$ , then $F((1)) = 0$ .

(B2) If $p$ is a probability measure on the set $\{1,\dots,n \}$ , and for each $1 \le i \le n$ , $Y_i$ is a finite set equipped with a probability measure $q_i$ , then

F(p\circ(q_1, \ldots, q_n)) = F(p) + p(F(q_1), \ldots, F(q_n))

Proof

Assume (A). Taking $n = 1$ , $p_1 = 1$ and $t = 1$ gives

S(1, 0) = S((1)) + 1S(1, 0)

from which (B1) follows immediately. Then a simple induction gives (B2).

Conversely, assume (B1) and (B2). Take $p = (p_1, \ldots, p_n)$ , $q_1 = (t, 1 - t)$ and $q_i = (1)$ for $i \geq 2$ : then (B1) gives

S(t p_1, (1 - t)p_1, p_2, \ldots, p_n) = S(p_1, \ldots, p_n) + p_1 S(t, 1 - t) + \sum_{i = 2}^n p_i S((1))

which by (B2) gives (A).

We henceforth call (B1,B2) the operadic property of a bijection-invariant function from probability measures on finite sets to nonnegative real numbers.

Summarizing our discussion so far, we have:

Corollary

Suppose $F$ is a map sending any probability measure on any finite set to a nonnegative real number. Suppose that:

$F$ is invariant under bijections.
$F$ is continuous.
$F$ has the operadic property.

Then $F$ is a constant nonnegative multiple of Shannon entropy.

Convex spaces

A convex set is a subset $X \subseteq V$ of a real vector space $V$ that is closed under the operations

c_p(x,y) = c x + (1-c) y

for all $0 \le p \le 1$ . We can abstract some properties of convex sets using the concept of a ‘convex space’:

Definition

A convex space is a set $X$ equipped with an operation $c_p: X \times X \to X$ for each number $0 \le p \le 1$ obeying the following laws:

$c_1(x,y) = x$ ,
$c_p(x,x) = x$ ,
$c_p(x,y) = c_{1-p}(y,x)$ ,
$c_p(c_q(x,y),z) = c_{p q}(x,c_r(y,z))$ for $(1 - p q)r = (1 - q)p$ .

These operations generate an algebraic theory whose set of $n$ -ary operations correspond to probability measures on the set $\{1, \dots, n\}$ . This set is so important that it needs a name. Of course it is often known as the $(n-1)$ -simplex and denoted $\Delta^{n-1}$ , but we want a name that involves the number $n$ and makes us think of probability:

Definition

Let $\mathbf{P}_n$ be the set of probability distributions on $\{1, \ldots, n\}$ .

We can think of $\mathbf{P}_n$ as the set of $n$ -ary operations in an operad. The idea here is that we can compose an operation $p \in \mathbf{P}_n$ with a list of operations $q_i \in \mathbf{P}_{k_i}$ for $1 \le i \le n$ to get an operation

p \circ (q_1, \dots, q_n) \in P_{k_1 + \cdots + k_n}

This is defined as earlier.

OLD STUFF, NOT YET INCORPORATED INTO THE PAPER YET:

Definition

The monad for convex sets is a monad on $Set$ sending any set $X$ to the set of finitely-supported probability distributions on $X$ .

For example, this monad sends $\{1, \ldots, n\}$ to a set $\mathbf{P}_n$ , which can be identified with the $(n - 1)$ -simplex. This monad is a finitary monad, so can be thought about in a few different ways.

A finitary monad is the same thing as a finitary algebraic theory. This one can be presented by a family $(*_t)_{t \in [0, 1]}$ of binary operations, subject to some equations. They’re probably exactly those equations that hold in a convex subset of $\mathbb{R}^n$ if we put $x *_t y = (1 - t)x + t y$ .

(I suspect there’s a theorem here: that for $n \ge 1$ , $\mathbb{R}^n$ “satisfies no laws”. This means there are no equations between the various operations $(x, y) \mapsto (1 - t)x + t y$ beyond those forced by its being an algebra for this theory.)

A finitary algebraic theory can also be thought of as a kind of souped-up operad. In a symmetric operad $P$ , one has for each bijection $\sigma: \{1, \ldots, n\} \to \{1, \ldots, n\}$ an induced map $\sigma_*: P_n \to P_n$ . In a finitary algebraic theory, one has the same thing for arbitrary functions between finite sets, not just bijections. In other words, a finitary algebraic theory amounts to a non-symmetric operad $P$ together with, for each function $\theta: \{1, \ldots, m\} \to \{1, \ldots, n\}$ between finite sets, an induced map $\theta_*: P_m \to P_n$ , satisfying suitable axioms.

Definition

The underlying symmetric operad for the convex monad is called the operad for convex algebras, and denote $\mathbf{P}$ . An algebra of $\mathbf{P}$ is called a convex algebra.

The space of $n$ -ary operations for this operad is $\mathbf{P}_n$ , the space of probability distributions on $\{1, \ldots, n\}$ . The composition of operations is supposed to be obvious, but we should probably write down a formula for it. The maps “ $\theta_*: \mathbf{P}_n \to \mathbf{P}_m$ ” can be defined by pushforward of measures. An algebra for the algebraic theory of convex algebras is an algebra $X$ for the operad with the further property that the square

\begin{matrix} \mathbf{P}_m \times X^n &\stackrel{1 \times \theta^*}{\to} &\mathbf{P}_m \times X^m \\ \theta_*\times 1 \downarrow & &\downarrow \\ \mathbf{P}_n \times X^n &\to &X \end{matrix}

commutes for all $\theta: \{1, \ldots, m\} \to \{1, \ldots, n\}$ .

Note that $\mathbf{P}$ is naturally a topological operad, where the topology on $\mathbf{P}_n$ is the usual topology on the $(n-1)$ -simplex. In our applications, we focus on algebras of $\mathbf{P}$ in topological categories with finite products. We call these convex algebras in $\mathcal{E}$ . Here are some examples:

Any convex subset of $\mathbb{R}^n$ is naturally a convex algebra in $Top$ . In particular, $[0,\infty)$ is such.
Moreover, if we regard the topological monoid $([0,\infty), +, 0)$ as a one-object topological category, then it is a convex algebra in $Cat(\Top)$ .
$\Fin\Meas$ is also a convex algebra in $Cat(\Top)$ .
$\Fin\Prob$ is also a convex algebra in $Cat(\Top)$ .

References

D. K. Fadeev, Zum Begriff der Entropie eines endlichen Wahrscheinlichkeitsschemas, Arbeiten zur Informationstheorie I, Berlin, Deutscher Verlag der Wissenschaften, 1957, pp. 85-90.

A. Rényi, Measures of information and entropy, in Proceedings of the 4th Berkeley Symposium on Mathematics, Statistics and Probability 1960, pp. 547–561.

Handbook of Analysis and its Foundations, Section 12.7 (short and to the point).
Romanowska, Smith, Orłowska; Abstract barycentric algebras; pdf. This generalises from $[0,1]$ to an arbitrary $L \Pi$ -algebra ( $L$ for ‘Łukasiewicz’, $\Pi$ for ‘product’, so think of $[0,1]$ as a space of fuzzy truth values).
Romanowska and Smith, Modal Theory: An Algebraic Approach to Order, Geometry, and Convexity; Res. Exp. Math. 9; Heldermann-Verlag, Berlin, 1985.
Marshall Harvey Stone, Postulates for the barycentric calculus. Ann. Mat. Pura. Appl. (4), 29:25–30, 1949.
Tobias Fritz, Convex spaces I: definition and examples arXiv:0903.5522.
Bart Jacobs, Duality for convexity, arXiv:0911.3834.
Joe Flood, Semiconvex geometry, J. Austral. Math. Soc. Ser. A 30 (1980/81), 496-–510.
T. Swirszcz, Monadic functors and categories of convex sets , Preprint No. 70, Proc. Inst. Math. Pol. Acad. Sci., Warsaw.
T. Swirszcz, Monadic functors and convexity, Bull. Acad. Polon. Sci. Ser. Sci. Math. Astronom. Phys. 22 (1974), 39–42.
Stanley P. Gudder, Convexity and mixtures, SIAM Review 19 (1977), 221–240.
Stanley P. Gudder, A general theory of convexity, Milan Journal of Mathematics, 49 (1979), 89–96.

Many other references, and a discussion of how convex spaces have been repeatedly rediscovered, can be found at the $n$ -Category Café post Convex Spaces.

Last revised on May 24, 2014 at 03:59:01. See the history of this page for a list of all contributions to it.