This is some material that never got incorporated into a paper on entropy by John Baez, Tobias Fritz and Tom Leinster. The finished version can be found here:
The paper was developed in a series of blog conversations, in this order:
John Baez, Category-theoretic characterizations of entropy.
John Baez, Category-theoretic characterizations of entropy II.
Tom Leinster, Entropies vs. means.
Tom Leinster, An operadic introduction to entropy.
John Baez, A characterization of entropy.
and it was written up on the nLab here:
For a lot more material that never got incorporated into this paper, see:
Rényi presents a nice characterization of Shannon entropy due to Fadeev. With only some minor cosmetic changes we can state it as follows.
We may think of a probability measure on a finite set as a function such that . The Shannon entropy of is
It is convenient to write a probability measure on the set as an -tuple .
(Fadeev) Suppose is a map sending any probability measure on any finite set to a nonnegative real number. Suppose that:
is invariant under bijections.
is continuous.
For any probability measure on a set of the form , and any number ,
Then is a constant nonnegative multiple of Shannon entropy.
In item 1 we’re using the fact that a bijection between finite sets together with a probability distribution on gives a probability distribution on ; we want these to have the same entropy. In item 2 we’re using the standard topology on to put a topology on the set of probability distributions on any -element set.
The most interesting and mysterious condition in Fadeev’s theorem is item 3. In a sense, this is what our work seeks to explain. The first step is to realize that this condition can be restated in an equivalent but conceptually clearer way. The key idea is that a probability measure on a set can be used to combine probability measures on an -indexed family of sets into a probability measure on their disjoint union. Item 3 says that Shannon entropy transforms in a simple way under this operation.
Suppose is a probability measure on the set . Suppose also that for each , is a finite set equipped with a probability measure . Then there is a probability measure on the disjoint union , defined as follows: if , then
There is a nice formula for the Shannon entropy of this new probability measure. To express it neatly, the following notation comes in handy. Suppose is a probability measure on the set and is an -tuple of real numbers. Then we write for the weighted arithmetic mean:
Now, we have:
Suppose is a probability measure on the set , and suppose that for each , is a finite set equipped with a probability measure . Then Shannon entropy obeys the law:
The proof is a calculation. We write for the value of at the point :
Moreover, the above law is almost equivalent to item 3 in Fadeev’s theorem:
Suppose is a map sending any probability measure on any finite set to a nonnegative real number. Suppose also that is invariant under bijections. Then the following are equivalent:
(A) For any probability measure on any set of the form , and any number ,
and
(B1) If denotes the unique probability measure on the set , then .
(B2) If is a probability measure on the set , and for each , is a finite set equipped with a probability measure , then
Assume (A). Taking , and gives
from which (B1) follows immediately. Then a simple induction gives (B2).
Conversely, assume (B1) and (B2). Take , and for : then (B1) gives
which by (B2) gives (A).
We henceforth call (B1,B2) the operadic property of a bijection-invariant function from probability measures on finite sets to nonnegative real numbers.
Summarizing our discussion so far, we have:
Suppose is a map sending any probability measure on any finite set to a nonnegative real number. Suppose that:
is invariant under bijections.
is continuous.
has the operadic property.
Then is a constant nonnegative multiple of Shannon entropy.
A convex set is a subset of a real vector space that is closed under the operations
for all . We can abstract some properties of convex sets using the concept of a ‘convex space’:
A convex space is a set equipped with an operation for each number obeying the following laws:
These operations generate an algebraic theory whose set of -ary operations correspond to probability measures on the set . This set is so important that it needs a name. Of course it is often known as the -simplex and denoted , but we want a name that involves the number and makes us think of probability:
Let be the set of probability distributions on .
We can think of as the set of -ary operations in an operad. The idea here is that we can compose an operation with a list of operations for to get an operation
This is defined as earlier.
OLD STUFF, NOT YET INCORPORATED INTO THE PAPER YET:
The monad for convex sets is a monad on sending any set to the set of finitely-supported probability distributions on .
For example, this monad sends to a set , which can be identified with the -simplex. This monad is a finitary monad, so can be thought about in a few different ways.
A finitary monad is the same thing as a finitary algebraic theory. This one can be presented by a family of binary operations, subject to some equations. They’re probably exactly those equations that hold in a convex subset of if we put .
(I suspect there’s a theorem here: that for , “satisfies no laws”. This means there are no equations between the various operations beyond those forced by its being an algebra for this theory.)
A finitary algebraic theory can also be thought of as a kind of souped-up operad. In a symmetric operad , one has for each bijection an induced map . In a finitary algebraic theory, one has the same thing for arbitrary functions between finite sets, not just bijections. In other words, a finitary algebraic theory amounts to a non-symmetric operad together with, for each function between finite sets, an induced map , satisfying suitable axioms.
The underlying symmetric operad for the convex monad is called the operad for convex algebras, and denote . An algebra of is called a convex algebra.
The space of -ary operations for this operad is , the space of probability distributions on . The composition of operations is supposed to be obvious, but we should probably write down a formula for it. The maps “” can be defined by pushforward of measures. An algebra for the algebraic theory of convex algebras is an algebra for the operad with the further property that the square
commutes for all .
Note that is naturally a topological operad, where the topology on is the usual topology on the -simplex. In our applications, we focus on algebras of in topological categories with finite products. We call these convex algebras in . Here are some examples:
Any convex subset of is naturally a convex algebra in . In particular, is such.
Moreover, if we regard the topological monoid as a one-object topological category, then it is a convex algebra in .
is also a convex algebra in .
is also a convex algebra in .
Handbook of Analysis and its Foundations, Section 12.7 (short and to the point).
Romanowska, Smith, Orłowska; Abstract barycentric algebras; pdf. This generalises from to an arbitrary -algebra ( for ‘Łukasiewicz’, for ‘product’, so think of as a space of fuzzy truth values).
Romanowska and Smith, Modal Theory: An Algebraic Approach to Order, Geometry, and Convexity; Res. Exp. Math. 9; Heldermann-Verlag, Berlin, 1985.
Marshall Harvey Stone, Postulates for the barycentric calculus. Ann. Mat. Pura. Appl. (4), 29:25–30, 1949.
Tobias Fritz, Convex spaces I: definition and examples arXiv:0903.5522.
Bart Jacobs, Duality for convexity, arXiv:0911.3834.
Joe Flood, Semiconvex geometry, J. Austral. Math. Soc. Ser. A 30 (1980/81), 496-–510.
T. Swirszcz, Monadic functors and categories of convex sets , Preprint No. 70, Proc. Inst. Math. Pol. Acad. Sci., Warsaw.
T. Swirszcz, Monadic functors and convexity, Bull. Acad. Polon. Sci. Ser. Sci. Math. Astronom. Phys. 22 (1974), 39–42.
Stanley P. Gudder, Convexity and mixtures, SIAM Review 19 (1977), 221–240.
Stanley P. Gudder, A general theory of convexity, Milan Journal of Mathematics, 49 (1979), 89–96.
Many other references, and a discussion of how convex spaces have been repeatedly rediscovered, can be found at the -Category Café post Convex Spaces.
Last revised on May 24, 2014 at 03:59:01. See the history of this page for a list of all contributions to it.