The following concerns one of the basic results about spaces, called Minkowski’s inequality.
Suppose , and suppose is a measure space with measure . Then the function defined by
defines a norm.
One must verify three things:
Separation axiom: implies .
Scaling axiom: .
Triangle inequality: .
The first two properties are obvious, so it remains to prove the last, which is also called Minkowski’s inequality.
Most (all?) of the proofs of Minkowski’s inequality that I’ve seen in textbooks involve a clever application of Hölder’s inequality, which is certainly not likely the first thing one would think of to do seeing Minkowski’s inequality for the first time.
In the first part of this page, I would like to try a more natural and perspicuous proof. Then, I would like to revisit the classical proof and build a little conceptual story which explains where it’s coming from.
The plan of the proof is to boil down the triangle inequality for to two things: the scaling axiom, and convexity of the function (as a function from complex numbers to real numbers).
We start with some generalities first. Let be a complex vector space equipped with a function that satisfies the scaling axiom: for all complex scalars , and the separation axiom: implies . As usual, we define the unit ball in to be
Given that the scaling and separation axioms hold, the following conditions are equivalent:
Condition 1. implies condition 2. easily: if and are in the unit ball and , we have
Now 2. implies 3. trivially, so it remains to prove that 3. implies 1. Suppose . Let and be the associated unit vectors. Then
where . If condition 3. holds, then
but by the scaling axiom, this is the same as saying
which is the triangle inequality.
Consider now with its -norm . The triangle inequality clearly holds for and , so we concentrate on the case where . By Lemma 1, the triangle inequality is equivalent to the following condition
If and , then whenever .
This condition allows us to remove the cumbersome exponent in the definition of -norm.
The function taking to is convex.
First, defines a convex function ; this follows easily from the triangle inequality and the scaling axiom for .
Second, for , the function is both monotone increasing and convex; this follows from the first and second derivative tests (the first and second derivatives of are nonnegative).
The conclusion follows because in general if is a convex function and is monotone increasing and convex, then is convex. The proof is trivial. Apply this to and for .
Let and be unit vectors in . By Condition 1, it suffices to show that for all . But
by Lemma 2. Using , we are done.
It could be that analysts who persist in giving what, in my opinion, is a somewhat unmemorable derivation of Minkowski’s inequality from Hölder’s inequality, might have something conceptually deeper to say, but they never seem to say it.
My own guess is that it has something to do with a certain relation between local convexity and duals.
To support this, recall that Hölder’s inequality
(for , , where ) is principally a statement about dual spaces: it asserts that the pairing
induces a pair of bounded linear maps and , defined by
Indeed, it explicitly asserts that the norm of is bounded above by , and that the norm of is bounded above by . Therefore, and are linear contractions.
Hölder’s inequality is in fact sharp: the norm of equals , and the norm of equals . That is to say: the canonical maps
are actually isometric embeddings.
As we will see, this implies that the unit ball in is the intersection of half-spaces
where ranges over the unit ball in . But an intersection of half-spaces is convex!
So, the hidden conceptual point is that the relation defines a Galois connection between subsets of and subsets of , one for which the unit balls are dual to each other, and therefore closed under the Galois connection. But in this case, since the relation respects convex combinations in each of the variables and , sets that are closed under the Galois connection are closed with respect to taking convex combinations. And that convexity of the unit ball is equivalent to Minkowski’s norm inequality.
In other words, I propose that the derivation of Minkowski’s inequality from Hölder’s inequality should be seen as a nicely cleaned-up but disguised version of a conceptual argument that leads from an isometric embedding into a dual space to the assertion that the embedded space is locally convex.
To put all this into a general context, recall the following notions. Given a topological vector space (let’s say over , but everything goes through for as well), an open neighborhood of the origin is
balanced if , then , and
bounded if for any open neighborhood of the origin, for some .
It is a fact that every TVS has a neighborhood basis consisting of balanced bounded neighborhoods. For each balanced bounded neighborhood there is a gauge function defined by
We will mostly be interested in the uniform topology which is generated from , that is, the smallest TVS topology making the gauge function continuous. The condition of being balanced implies the scaling condition
(proof to be inserted later). The separation condition for is usually not satisfied and anyway will not concern us.
Now suppose given two TVS, and , which come equipped with gauge functions and respectively, and for which there is a bilinear pairing
satisfying a Hölder-type inequality
which is sharp in the sense that given (resp., ), there exists (resp., ) for which the Hölder inequality is an equality.
Notice that the Hölder inequality implies that for each , the functional is uniformly continuous with respect to the -topology, with Lipschitz constant . Similarly if we interchange the roles of and .
As always, we define the unit ball (in appropriate gauge) by
The unit ball is convex if and only if satisfies the Minkowski or triangle inequality: .
Under the sharp Hölder inequality, the unit balls in and are convex.
By the Hölder inequality, we have the inclusion
where denotes the affine half-space
We also have the reverse inclusion because the Hölder inequality is sharp. In more detail: suppose , i.e., that . We may choose such that and . But then , so does not belong to the intersection of the affine hyperspaces above.
Thus the inclusion is an equality. Therefore, the unit ball , being an intersection of affine half-spaces, is convex.
Armed with this conceptual explanation of the relation between duality and convexity, let’s revisit the classical derivation of Minkowski’s inequality from Hölder’s inequality.
First, we should verify that the Hölder inequality between and , where and , is sharp in the sense described above.
A special case: suppose is a function taking values in such that . Put . Then , so , and also
whence sharpness follows.
For the more general case, removing the assumption that is valued in , let be the function such that whenever is nonzero, and whenever . Then , so and the sharpness of the Hölder inequality goes through as in (1) above.
The Minkowski inequality, or equivalently the convexity of the unit ball in , therefore follows by applying the theorem above.
Now the usual proof of Hölder Minkowski runs something like this: let . We know (proof: – convexity of the function again!). By rescaling, we may assume without loss of generality that . Then
where . Since , we have , as desired.