Todd Trimble


The following concerns one of the basic results about L p spaces.


Suppose 1p, and suppose X is a measure space with measure μ. Then the function () p:L p(X,μ) defined by

f p( Xf pdμ) 1/p{|f|}_p \coloneqq (\int_X {|f|}^p d\mu)^{1/p}

defines a norm.

One must verify three things:

  1. Separation axiom: f p=0 implies f=0.

  2. Scaling axiom: tf p=tf p.

  3. Triangle inequality: f+g pf p+g p.

The first two properties are obvious, so it remains to prove the last, which is also called Minkowski’s inequality.

Most (all?) of the proofs of Minkowski’s inequality that I’ve seen in textbooks involve a clever application of Hölder’s inequality, which is certainly not the first thing one would think of.

In the first part of this page, I would like to try a more natural and perspicuous proof. Then, I would like to revisit the classical proof and build a little conceptual story which explains where it’s coming from.

A natural proof of Minkowski’s inequality

The plan of the proof is to boil down the triangle inequality for f p to two things: the scaling axiom, and convexity of the function xx p (as a function from complex numbers to real numbers).

We start with some generalities first. Let V be a complex vector space equipped with a function ():V[0,] that satisfies the scaling axiom: αv=αv for all complex scalars α, and the separation axiom: v=0 implies v=0. As usual, we define the unit ball in V to be {vV:v1}.


Given that the scaling and separation axioms hold, the following conditions are equivalent:

  1. The triangle inequality is satisfied.
  2. The unit ball is convex.
  3. If u=v=1, then tu+(1t)v1 for all t[0,1].

Condition 1. implies condition 2. easily: if v and v are in the unit ball and 0t1, we have

tu+(1t)v tu+(1t)v = tu+(1t)v t+(1t)=1.\array{ {\|t u + (1-t)v\|} & \leq & {\|t u\|} + {\|(1-t)v\|} \\ & = & t {\|u\|} + (1-t) {\|v\|} \\ & \leq & t + (1-t) = 1.}

Now 2. implies 3. trivially, so it remains to prove that 3. implies 1. Suppose v,v(0,). Let u=vv and u=vv be the associated unit vectors. Then

v+vv+v = (vv+v)vv+(vv+v)vv = tu+(1t)u\array{ \frac{v + v'}{{\|v\|}+{\|v'\|}} & = & (\frac{{\|v\|}}{{\|v\|}+{\|v'\|}})\; \frac{v}{{\|v\|}} \;\; + \; \; (\frac{{\|v'\|}}{{\|v\|}+{\|v'\|}})\frac{v'}{{\|v'\|}} \\ & = & t u + (1-t)u'}

where t=vv+v. If condition 3. holds, then

tu+(1t)u1{\|t u + (1-t)u'\|} \leq 1

but by the scaling axiom, this is the same as saying

v+vv+v1\frac{{\|v + v'\|}}{{\|v\|} + {\|v'\|}} \leq 1

which is the triangle inequality.

Consider now L p with its p-norm f=f p. The triangle inequality clearly holds for p=1 and p=, so we concentrate on the case where 1<p<. By lemma 1, the triangle inequality is equivalent to

This will allow us to remove the cumbersome exponent 1/p in the definition of p-norm.


Given p>1 and α,β two complex numbers, define

γ(t)=α+βt p\gamma(t) = {|\alpha + \beta t|}^p

for real t. Then γ(t) is nonnegative.


Suppose the line parametrized by the function tα+βt does not pass through the origin (this easy case is left to the reader).

We may write γ(t)=[(α+βt)(α¯+β¯t)] p/2. After a short calculation we have

γ(t)=pα+βt p2(Re(αβ¯)+β 2t)\gamma'(t) = p{|\alpha + \beta t|}^{p-2}(Re(\alpha \widebar{\beta}) + {|\beta|}^2 t)

and thence

γ(t) = p(p2)α+βt p4(Re(αβ¯)+β 2t) 2 +pα+βt p2β 2.\array{ \gamma''(t) & = & p(p-2){|\alpha + \beta t|}^{p-4}(Re(\alpha \widebar{\beta}) + {|\beta|}^2 t)^2 \\ & & + p{|\alpha + \beta t|}^{p-2} {|\beta|}^2}.

For p2, all factors in sight are plainly nonnegative. For 1<p<2, after factoring out pα+βt p4, one is left with

(p2)(Re(αβ¯)+β 2t) 2+α+βt 2β 2 (Re(αβ¯)+β 2t) 2+α+βt 2β 2 = Re(β¯(α+βt)) 2+β¯(α+βt) 2 = Im(β¯(α+βt)) 2\array{ (p-2)(Re(\alpha \widebar{\beta}) + {|\beta|}^2 t)^2 + {|\alpha + \beta t|}^2{|\beta|}^2 & \geq & -(Re(\alpha \widebar{\beta}) + {|\beta|}^2 t)^2 + {|\alpha + \beta t|}^2{|\beta|}^2 \\ & = & -Re(\bar{\beta}(\alpha + \beta t))^2 + {|\bar{\beta}(\alpha + \beta t)|}^2 \\ & = & Im(\bar{\beta}(\alpha + \beta t))^2 }

which is again positive.

Lemma 3

Define ϕ: by ϕ(x)=x p. Then ϕ is convex, i.e., for all x,y,

tx+(1t)y ptx p+(1t)y p{|t x + (1-t)y|}^p \leq t{|x|}^p + (1-t){|y|}^p

for all t[0,1].


Hold fixed two complex numbers x and y, and put

δ(t) = tx+(1t)y p(tx p+(1t)y p) = y+t(xy) p(tx p+(1t)y p).\array{ \delta(t) & = & {|t x + (1-t)y|}^p - (t{|x|}^p + (1-t){|y|}^p) \\ & = & {|y + t(x-y)|}^p - (t{|x|}^p + (1-t){|y|}^p). }

We have δ(0)=0=δ(1), and δ(t)0 by lemma 2. By the second derivative test, this is enough to guarantee that δ(t)0 for all t[0,1], which completes the proof.

Proof of theorem

Let u and v be unit vectors in L p. By condition 4, it suffices to show that tu+(1t)v p1 for all t[0,1]. But

Xtu+(1t)v pdμ Xtu p+(1t)v pdμ\int_X {|t u + (1-t)v|}^p d\mu \leq \int_X t{|u|}^p + (1-t){|v|}^p d\mu

by lemma 3. Using u p=1=v p, we are done.

Local convexity and duality: intuitions

It could be that analysts who persist in giving what, in my opinion, is a somewhat unmemorable derivation of Minkowski’s inequality from Hölder’s inequality, might have something conceptually deeper to say, but they never seem to say it.

My own guess is that it has something to do with a certain relation between local convexity and duals.

To support this, recall that Hölder’s inequality

Xfgdμf pg q{| \int_X f \cdot g d\mu |} \leq {|f|}_p {|g|}_q

(for fL p(X), gL q(X), where 1p+1q=1) is principally a statement about dual spaces: it asserts that the pairing

(f,g)f,g Xfgdμ(f, g) \mapsto \langle f, g \rangle \coloneqq \int_X f \cdot g d\mu

induces a pair of bounded linear maps λ:L p(L q) * and ρ:L q(L p) *, defined by

λ(f)(g)=f,g=ρ(g)(f).\lambda(f)(g) = \langle f, g \rangle = \rho(g)(f).

Indeed, it explicitly asserts that the norm of λ(f) is bounded above by f p, and that the norm of ρ(g) is bounded above by g q. Therefore, λ and ρ are linear contractions.

Hölder’s inequality is in fact sharp: the norm of λ(f) equals f p, and the norm of ρ(g) equals g q. That is to say: the canonical maps

λ:L p(L q) *,ρ:L q(L p) *\lambda: L^p \to (L^q)^\ast, \qquad \rho: L^q \to (L^p)^\ast

are actually isometric embeddings.

As we will see, this implies that the unit ball in L p is the intersection of half-spaces

H g={fL p:f,g1}H_g = \{f \in L^p: \langle f, g \rangle \leq 1\}

where g ranges over the unit ball in L q. But an intersection of half-spaces is convex!

So, the hidden conceptual point is that the relation f,g1 defines a Galois connection between subsets of L p and subsets of L q, one for which the unit balls are dual to each other, and therefore closed under the Galois connection. But in this case, since the relation f,g1 respects convex combinations in each of the variables f and g, sets that are closed under the Galois connection are closed with respect to taking convex combinations. And that convexity of the unit ball is equivalent to Minkowski’s norm inequality.

In other words, I propose that the derivation of Minkowski’s inequality from Hölder’s inequality should be seen as a nicely cleaned-up but disguised version of a conceptual argument that leads from an isometric embedding into a dual space to the assertion that the embedded space is locally convex.

Local convexity and duality: proofs

To put all this into a general context, recall the following notions. Given a topological vector space X (let’s say over , but everything goes through for as well), an open neighborhood U of the origin is

It is a fact that every TVS X has a neighborhood basis consisting of balanced bounded neighborhoods. For each balanced bounded neighborhood U there is a gauge function ρ U:X defined by

ρ U(x)=inf{t>0:xtU}\rho_U(x) = \inf \{t \gt 0: x \in t U\}

We will mostly be interested in the uniform topology which is generated from U, that is, the smallest TVS topology making the gauge function ρ U continuous. The condition of being balanced implies the scaling condition

ρ U(ax)=aρ U(x)\rho_U(a x) = |a|\rho_U(x)

(proof to be inserted later). The separation condition for ρ U is usually not satisfied and anyway will not concern us.

Now suppose given two TVS, X and Y, which come equipped with gauge functions ρ U and ρ V respectively, and for which there is a bilinear pairing

,:X×Y\langle -, -\rangle: X \times Y \to \mathbb{R}

satisfying a Hölder-type inequality

x,yρ U(x)ρ V(y)|\langle x, y \rangle| \leq \rho_U(x)\rho_V(y)

which is sharp in the sense that given xX (resp., yY), there exists yY (resp., xX) for which the Hölder inequality is an equality.

Notice that the Hölder inequality implies that for each xX, the functional x,:Y is uniformly continuous with respect to the ρ V-topology, with Lipschitz constant ρ U(x). Similarly if we interchange the roles of X and Y.

As always, we define the unit ball (in appropriate gauge) by

B U(1)={xX:ρ U(x)1}B_U(1) = \{x \in X: \rho_U(x) \leq 1\}

The unit ball is convex if and only if ρ U satisfies the Minkowski or triangle inequality: ρ U(x+x)ρ U(x)+ρ U(x).


Under the sharp Hölder inequality, the unit balls in X and Y are convex.


By the Hölder inequality, we have the inclusion

B U(1) y:ρ V(y)=1H yB_U(1) \subseteq \bigcap_{y: \rho_V(y) = 1} H_y

where H y denotes the affine half-space

{xX:x,y1}.\{x \in X: \langle x, y \rangle \leq 1\}.

We also have the reverse inclusion because the Hölder inequality is sharp. In more detail: suppose xB U(1), i.e., that ρ U(x)=c>1. We may choose y such that ρ V(y)=1 and x,y=c>1. But then xH y, so x does not belong to the intersection of the affine hyperspaces above.

Thus the inclusion is an equality. Therefore, the unit ball B U(1), being an intersection of affine half-spaces, is convex.

Reprising the proof of the Minkowski inequality from Hölder’s inequality

Armed with this conceptual explanation of the relation between duality and convexity, let’s revisit the classical derivation of Minkowski’s inequality from Hölder’s inequality.

First, we should verify that the Hölder inequality between L p and L q, where 1<p,q< and 1p+1q=1, is sharp in the sense described above.

A special case: suppose f is a function taking values in [0,] such that f p=1. Put g=f p/q=f p1. Then g q=f p, so g q=1, and also

fg=f p/pf p/q=f p(1/p+1/q)=f p=1(1)\int f \cdot g = \int |f|^{p/p}|f|^{p/q} = \int |f|^{p(1/p + 1/q)} = \int |f|^p = 1 \qquad (1)

whence sharpness follows.

For the more general case, removing the assumption that f is valued in [0,], let g be the function such that fg=f p whenever f is nonzero, and g=0 whenever f=0. Then g q=f q(p1)=f p, so g q=1 and the sharpness of the Hölder inequality goes through as in (1) above.

The Minkowski inequality, or equivalently the convexity of the unit ball in L p, therefore follows by applying the theorem above.

Now the usual proof of Hölder Minkowski runs something like this: let f,gL p. We know f+gL p (proof: f+g2 p12f p+12g p – convexity of the function xx p again!). By rescaling, we may assume without loss of generality that f+g p=1. Then

1= Xf+g pdμ Xff+g p1+ Xgf+g p1 f ph q+g ph q\array{ 1 = \int_X \vert f+g \vert^p d\mu & \leq & \int_X \vert f \vert \cdot \vert f+g \vert^{p-1} + \int_X \vert g \vert \cdot \vert f+g \vert^{p-1} \\ & \leq & \vert f \vert_p \vert h \vert_q + \vert g \vert_p \vert h \vert_q }

where h=f+g p1. Since h q=f+g p=1, we have f+g p=1f p+g p, as desired.

Revised on February 5, 2013 05:39:03 by Todd Trimble