The following concerns one of the basic results about $L^p$ spaces.
Suppose $1 \leq p \leq \infty$, and suppose $X$ is a measure space with measure $\mu$. Then the function ${|(-)|}_p: L^p(X, \mu) \to \mathbb{R}$ defined by
defines a norm.
One must verify three things:
Separation axiom: ${|f|}_p = 0$ implies $f = 0$.
Scaling axiom: ${|t f|}_p = {|t|} \cdot {|f|}_p$.
Triangle inequality: ${|f + g|}_p \leq {|f|}_p + {|g|}_p$.
The first two properties are obvious, so it remains to prove the last, which is also called Minkowski’s inequality.
Most (all?) of the proofs of Minkowski’s inequality that I’ve seen in textbooks involve a clever application of Hölder’s inequality, which is certainly not the first thing one would think of.
In the first part of this page, I would like to try a more natural and perspicuous proof. Then, I would like to revisit the classical proof and build a little conceptual story which explains where it’s coming from.
The plan of the proof is to boil down the triangle inequality for ${|f|}_p$ to two things: the scaling axiom, and convexity of the function $x \mapsto {|x|}^p$ (as a function from complex numbers to real numbers).
We start with some generalities first. Let $V$ be a complex vector space equipped with a function ${\|(-)\|}: V \to [0, \infty]$ that satisfies the scaling axiom: ${\|\alpha v\|} = {|\alpha|} \cdot {\|v\|}$ for all complex scalars $\alpha$, and the separation axiom: ${\|v\|} = 0$ implies $v = 0$. As usual, we define the unit ball in $V$ to be $\{v \in V: {\|v\|} \leq 1\}.$
Given that the scaling and separation axioms hold, the following conditions are equivalent:
Condition 1. implies condition 2. easily: if $v$ and $v$ are in the unit ball and $0 \leq t \leq 1$, we have
Now 2. implies 3. trivially, so it remains to prove that 3. implies 1. Suppose ${\|v\|}, {\|v'\|} \in (0, \infty)$. Let $u = \frac{v}{{\|v\|}}$ and $u' = \frac{v'}{{\|v'\|}}$ be the associated unit vectors. Then
where $t = \frac{{\|v\|}}{{\|v\|} + {\|v'\|}}$. If condition 3. holds, then
but by the scaling axiom, this is the same as saying
which is the triangle inequality.
Consider now $L^p$ with its $p$-norm ${\|f\|} = {|f|}_p$. The triangle inequality clearly holds for $p=1$ and $p = \infty$, so we concentrate on the case where $1 \lt p \lt \infty$. By lemma 1, the triangle inequality is equivalent to
This will allow us to remove the cumbersome exponent $1/p$ in the definition of $p$-norm.
Given $p \gt 1$ and $\alpha, \beta$ two complex numbers, define
for real $t$. Then $\gamma''(t)$ is nonnegative.
Suppose the line segment parametrized by the function $t \mapsto \alpha + \beta t$ does not pass through the origin (this easy case is left to the reader).
We may write $\gamma(t) = [(\alpha + \beta t)(\widebar{\alpha} + \widebar{\beta} t)]^{p/2}$. After a short calculation we have
and thence
After factoring out $p {|\alpha + \beta t|}^{p-4}$, one is left with
where the first inequality uses $1 \lt p$. The last expression is again positive.
Define $\phi: \mathbb{C} \to \mathbb{R}$ by $\phi(x) = {|x|}^p$. Then $\phi$ is convex, i.e., for all $x, y$,
for all $t \in [0, 1]$.
Hold fixed two complex numbers $x$ and $y$, and put
We have $\delta(0) = 0 = \delta(1)$, and $\delta''(t) \geq 0$ by lemma 2. By the second derivative test, this is enough to guarantee that $\delta(t) \leq 0$ for all $t \in [0, 1]$, which completes the proof.
Let $u$ and $v$ be unit vectors in $L^p$. By condition 4, it suffices to show that ${|t u + (1-t)v|}_p \leq 1$ for all $t \in [0, 1]$. But
by lemma 3. Using $\int {|u|}^p = 1 = \int {|v|}^p$, we are done.
It could be that analysts who persist in giving what, in my opinion, is a somewhat unmemorable derivation of Minkowski’s inequality from Hölder’s inequality, might have something conceptually deeper to say, but they never seem to say it.
My own guess is that it has something to do with a certain relation between local convexity and duals.
To support this, recall that Hölder’s inequality
(for $f \in L^p(X)$, $g \in L^q(X)$, where $\frac1{p} + \frac1{q} = 1$) is principally a statement about dual spaces: it asserts that the pairing
induces a pair of bounded linear maps $\lambda: L^p \to (L^q)^\ast$ and $\rho: L^q \to (L^p)^\ast$, defined by
Indeed, it explicitly asserts that the norm of $\lambda(f)$ is bounded above by $\vert f \vert_p$, and that the norm of $\rho(g)$ is bounded above by $\vert g \vert_q$. Therefore, $\lambda$ and $\rho$ are linear contractions.
Hölder’s inequality is in fact sharp: the norm of $\lambda(f)$ equals $\vert f \vert_p$, and the norm of $\rho(g)$ equals $\vert g \vert_q$. That is to say: the canonical maps
are actually isometric embeddings.
As we will see, this implies that the unit ball in $L^p$ is the intersection of half-spaces
where $g$ ranges over the unit ball in $L^q$. But an intersection of half-spaces is convex!
So, the hidden conceptual point is that the relation $\langle f, g \rangle \leq 1$ defines a Galois connection between subsets of $L^p$ and subsets of $L^q$, one for which the unit balls are dual to each other, and therefore closed under the Galois connection. But in this case, since the relation $\langle f, g \rangle \leq 1$ respects convex combinations in each of the variables $f$ and $g$, sets that are closed under the Galois connection are closed with respect to taking convex combinations. And that convexity of the unit ball is equivalent to Minkowski’s norm inequality.
In other words, I propose that the derivation of Minkowski’s inequality from Hölder’s inequality should be seen as a nicely cleaned-up but disguised version of a conceptual argument that leads from an isometric embedding into a dual space to the assertion that the embedded space is locally convex.
To put all this into a general context, recall the following notions. Given a topological vector space $X$ (let’s say over $\mathbb{R}$, but everything goes through for $\mathbb{C}$ as well), an open neighborhood $U$ of the origin is
balanced if $|t| \leq 1$, then $t U \subseteq U$, and
bounded if for any open neighborhood $V$ of the origin, $U \subseteq t V$ for some $t$.
It is a fact that every TVS $X$ has a neighborhood basis consisting of balanced bounded neighborhoods. For each balanced bounded neighborhood $U$ there is a gauge function $\rho_U: X \to \mathbb{R}$ defined by
We will mostly be interested in the uniform topology which is generated from $U$, that is, the smallest TVS topology making the gauge function $\rho_U$ continuous. The condition of being balanced implies the scaling condition
(proof to be inserted later). The separation condition for $\rho_U$ is usually not satisfied and anyway will not concern us.
Now suppose given two TVS, $X$ and $Y$, which come equipped with gauge functions $\rho_U$ and $\rho_V$ respectively, and for which there is a bilinear pairing
satisfying a Hölder-type inequality
which is sharp in the sense that given $x \in X$ (resp., $y \in Y$), there exists $y \in Y$ (resp., $x \in X$) for which the Hölder inequality is an equality.
Notice that the Hölder inequality implies that for each $x \in X$, the functional $\langle x, -\rangle: Y \to \mathbb{R}$ is uniformly continuous with respect to the $\rho_V$-topology, with Lipschitz constant $\rho_U(x)$. Similarly if we interchange the roles of $X$ and $Y$.
As always, we define the unit ball (in appropriate gauge) by
The unit ball is convex if and only if $\rho_U$ satisfies the Minkowski or triangle inequality: $\rho_U(x + x' ) \leq \rho_U(x) + \rho_U(x')$.
Under the sharp Hölder inequality, the unit balls in $X$ and $Y$ are convex.
By the Hölder inequality, we have the inclusion
where $H_y$ denotes the affine half-space
We also have the reverse inclusion because the Hölder inequality is sharp. In more detail: suppose $x \notin B_U(1)$, i.e., that $\rho_U(x) = c \gt 1$. We may choose $y$ such that $\rho_V(y) = 1$ and $\langle x, y \rangle = c \gt 1$. But then $x \notin H_y$, so $x$ does not belong to the intersection of the affine hyperspaces above.
Thus the inclusion is an equality. Therefore, the unit ball $B_U(1)$, being an intersection of affine half-spaces, is convex.
Armed with this conceptual explanation of the relation between duality and convexity, let’s revisit the classical derivation of Minkowski’s inequality from Hölder’s inequality.
First, we should verify that the Hölder inequality between $L^p$ and $L^q$, where $1 \lt p, q \lt \infty$ and $\frac1{p} + \frac1{q} = 1$, is sharp in the sense described above.
A special case: suppose $f$ is a function taking values in $[0, \infty]$ such that $|f|_p = 1$. Put $g = f^{p/q} = f^{p-1}$. Then $|g|^q = |f|^p$, so $|g|_q = 1$, and also
whence sharpness follows.
For the more general case, removing the assumption that $f$ is valued in $[0, \infty]$, let $g$ be the function such that $f g = |f|^p$ whenever $f$ is nonzero, and $g = 0$ whenever $f = 0$. Then $|g|^q = |f|^{q(p-1)} = |f|^p$, so $|g|_q = 1$ and the sharpness of the Hölder inequality goes through as in (1) above.
The Minkowski inequality, or equivalently the convexity of the unit ball in $L^p$, therefore follows by applying the theorem above.
Now the usual proof of Hölder $\Rightarrow$ Minkowski runs something like this: let $f, g \in L^p$. We know $f + g \in L^p$ (proof: $\vert \frac{f+g}{2} \vert^p \leq \frac1{2}\vert f \vert^p + \frac1{2}\vert g \vert^p$ – convexity of the function $x \mapsto x^p$ again!). By rescaling, we may assume without loss of generality that $\vert f+g \vert_p = 1$. Then
where $h = \vert f+g \vert^{p-1}$. Since $\vert h \vert_q = \vert f+g \vert_p = 1$, we have $\vert f+g \vert_p = 1 \leq \vert f \vert_p + \vert g \vert_p$, as desired.