Todd Trimble p-norms



The following concerns one of the basic results about L pL^p spaces, called Minkowski’s inequality.


Suppose 1p1 \leq p \leq \infty, and suppose XX is a measure space with measure μ\mu. Then the function |()| p:L p(X,μ){|(-)|}_p: L^p(X, \mu) \to \mathbb{R} defined by

|f| p( X|f| pdμ) 1/p{|f|}_p \coloneqq (\int_X {|f|}^p d\mu)^{1/p}

defines a norm.

One must verify three things:

  1. Separation axiom: |f| p=0{|f|}_p = 0 implies f=0f = 0.

  2. Scaling axiom: |tf| p=|t||f| p{|t f|}_p = {|t|} \cdot {|f|}_p.

  3. Triangle inequality: |f+g| p|f| p+|g| p{|f + g|}_p \leq {|f|}_p + {|g|}_p.

The first two properties are obvious, so it remains to prove the last, which is also called Minkowski’s inequality.

Most (all?) of the proofs of Minkowski’s inequality that I’ve seen in textbooks involve a clever application of Hölder’s inequality, which is certainly not likely the first thing one would think of to do seeing Minkowski’s inequality for the first time.

In the first part of this page, I would like to try a more natural and perspicuous proof. Then, I would like to revisit the classical proof and build a little conceptual story which explains where it’s coming from.

A natural proof of Minkowski’s inequality

The plan of the proof is to boil down the triangle inequality for |f| p{|f|}_p to two things: the scaling axiom, and convexity of the function x|x| px \mapsto {|x|}^p (as a function from complex numbers to real numbers).

We start with some generalities first. Let VV be a complex vector space equipped with a function ():V[0,]{\|(-)\|}: V \to [0, \infty] that satisfies the scaling axiom: αv=|α|v{\|\alpha v\|} = {|\alpha|} \cdot {\|v\|} for all complex scalars α\alpha, and the separation axiom: v=0{\|v\|} = 0 implies v=0v = 0. As usual, we define the unit ball in VV to be {vV:v1}.\{v \in V: {\|v\|} \leq 1\}.


Given that the scaling and separation axioms hold, the following conditions are equivalent:

  1. The triangle inequality is satisfied.
  2. The unit ball is convex.
  3. If u=v=1{\|u\|} = {\|v\|} = 1, then tu+(1t)v1{\|t u + (1-t)v\|} \leq 1 for all t[0,1]t \in [0, 1].

Condition 1. implies condition 2. easily: if vv and vv are in the unit ball and 0t10 \leq t \leq 1, we have

tu+(1t)v tu+(1t)v = tu+(1t)v t+(1t)=1.\array{ {\|t u + (1-t)v\|} & \leq & {\|t u\|} + {\|(1-t)v\|} \\ & = & t {\|u\|} + (1-t) {\|v\|} \\ & \leq & t + (1-t) = 1.}

Now 2. implies 3. trivially, so it remains to prove that 3. implies 1. Suppose v,v(0,){\|v\|}, {\|v'\|} \in (0, \infty). Let u=vvu = \frac{v}{{\|v\|}} and u=vvu' = \frac{v'}{{\|v'\|}} be the associated unit vectors. Then

v+vv+v = (vv+v)vv+(vv+v)vv = tu+(1t)u\array{ \frac{v + v'}{{\|v\|}+{\|v'\|}} & = & (\frac{{\|v\|}}{{\|v\|}+{\|v'\|}})\; \frac{v}{{\|v\|}} \;\; + \; \; (\frac{{\|v'\|}}{{\|v\|}+{\|v'\|}})\frac{v'}{{\|v'\|}} \\ & = & t u + (1-t)u'}

where t=vv+vt = \frac{{\|v\|}}{{\|v\|} + {\|v'\|}}. If condition 3. holds, then

tu+(1t)u1{\|t u + (1-t)u'\|} \leq 1

but by the scaling axiom, this is the same as saying

v+vv+v1\frac{{\|v + v'\|}}{{\|v\|} + {\|v'\|}} \leq 1

which is the triangle inequality.

Consider now L pL^p with its pp-norm f=|f| p{\|f\|} = {|f|}_p. The triangle inequality clearly holds for p=1p=1 and p=p = \infty, so we concentrate on the case where 1<p<1 \lt p \lt \infty. By Lemma 1, the triangle inequality is equivalent to the following condition


If |u| p p=1{|u|}_{p}^{p} = 1 and |v| p p=1{|v|}_{p}^{p} = 1, then |tu+(1t)v| p p1{|t u + (1-t)v|}_{p}^{p} \leq 1 whenever 0t10 \leq t \leq 1.

This condition allows us to remove the cumbersome exponent 1/p1/p in the definition of pp-norm.


The function ϕ:[0,)\phi: \mathbb{C} \to [0, \infty) taking zz to z|z| pz \mapsto {|z|}^p is convex.


First, z|z|z \mapsto {|z|} defines a convex function [0,)\mathbb{C} \to [0, \infty); this follows easily from the triangle inequality |z+w||z|+|w|{|z + w|} \leq {|z|} + {|w|} and the scaling axiom |tz|=|t||z|=t|z|{|t z|} = {|t|} \cdot {|z|} = t {|z|} for t0t \geq 0.

Second, for p1p \geq 1, the function () p:[0,)[0,)(-)^p: [0, \infty) \to [0, \infty) is both monotone increasing and convex; this follows from the first and second derivative tests (the first and second derivatives of tt pt \mapsto t^p are nonnegative).

The conclusion follows because in general if g:D[0,)g: D \to [0, \infty) is a convex function and f:[0,)[0,)f: [0, \infty) \to [0, \infty) is monotone increasing and convex, then fg:D[0,)f \circ g: D \to [0, \infty) is convex. The proof is trivial. Apply this to g(z)=|z|g(z) = {|z|} and f(t)=t pf(t) = t^p for p>1p \gt 1.

Proof of Minkowski’s inequality

Let uu and vv be unit vectors in L pL^p. By Condition 1, it suffices to show that |tu+(1t)v| p p1{|t u + (1-t)v|}_p^p \leq 1 for all t[0,1]t \in [0, 1]. But

X|tu+(1t)v| pdμ Xt|u| p+(1t)|v| pdμ\int_X {|t u + (1-t)v|}^p d\mu \leq \int_X t{|u|}^p + (1-t){|v|}^p d\mu

by Lemma 2. Using |u| p=1=|v| p\int {|u|}^p = 1 = \int {|v|}^p, we are done.

Local convexity and duality: intuitions

It could be that analysts who persist in giving what, in my opinion, is a somewhat unmemorable derivation of Minkowski’s inequality from Hölder’s inequality, might have something conceptually deeper to say, but they never seem to say it.

My own guess is that it has something to do with a certain relation between local convexity and duals.

To support this, recall that Hölder’s inequality

| Xfgdμ||f| p|g| q{| \int_X f \cdot g d\mu |} \leq {|f|}_p {|g|}_q

(for fL p(X)f \in L^p(X), gL q(X)g \in L^q(X), where 1p+1q=1\frac1{p} + \frac1{q} = 1) is principally a statement about dual spaces: it asserts that the pairing

(f,g)f,g Xfgdμ(f, g) \mapsto \langle f, g \rangle \coloneqq \int_X f \cdot g d\mu

induces a pair of bounded linear maps λ:L p(L q) *\lambda: L^p \to (L^q)^\ast and ρ:L q(L p) *\rho: L^q \to (L^p)^\ast, defined by

λ(f)(g)=f,g=ρ(g)(f).\lambda(f)(g) = \langle f, g \rangle = \rho(g)(f).

Indeed, it explicitly asserts that the norm of λ(f)\lambda(f) is bounded above by |f| p\vert f \vert_p, and that the norm of ρ(g)\rho(g) is bounded above by |g| q\vert g \vert_q. Therefore, λ\lambda and ρ\rho are linear contractions.

Hölder’s inequality is in fact sharp: the norm of λ(f)\lambda(f) equals |f| p\vert f \vert_p, and the norm of ρ(g)\rho(g) equals |g| q\vert g \vert_q. That is to say: the canonical maps

λ:L p(L q) *,ρ:L q(L p) *\lambda: L^p \to (L^q)^\ast, \qquad \rho: L^q \to (L^p)^\ast

are actually isometric embeddings.

As we will see, this implies that the unit ball in L pL^p is the intersection of half-spaces

H g={fL p:f,g1}H_g = \{f \in L^p: \langle f, g \rangle \leq 1\}

where gg ranges over the unit ball in L qL^q. But an intersection of half-spaces is convex!

So, the hidden conceptual point is that the relation f,g1\langle f, g \rangle \leq 1 defines a Galois connection between subsets of L pL^p and subsets of L qL^q, one for which the unit balls are dual to each other, and therefore closed under the Galois connection. But in this case, since the relation f,g1\langle f, g \rangle \leq 1 respects convex combinations in each of the variables ff and gg, sets that are closed under the Galois connection are closed with respect to taking convex combinations. And that convexity of the unit ball is equivalent to Minkowski’s norm inequality.

In other words, I propose that the derivation of Minkowski’s inequality from Hölder’s inequality should be seen as a nicely cleaned-up but disguised version of a conceptual argument that leads from an isometric embedding into a dual space to the assertion that the embedded space is locally convex.

Local convexity and duality: proofs

To put all this into a general context, recall the following notions. Given a topological vector space XX (let’s say over \mathbb{R}, but everything goes through for \mathbb{C} as well), an open neighborhood UU of the origin is

It is a fact that every TVS XX has a neighborhood basis consisting of balanced bounded neighborhoods. For each balanced bounded neighborhood UU there is a gauge function ρ U:X\rho_U: X \to \mathbb{R} defined by

ρ U(x)=inf{t>0:xtU}\rho_U(x) = \inf \{t \gt 0: x \in t U\}

We will mostly be interested in the uniform topology which is generated from UU, that is, the smallest TVS topology making the gauge function ρ U\rho_U continuous. The condition of being balanced implies the scaling condition

ρ U(ax)=|a|ρ U(x)\rho_U(a x) = |a|\rho_U(x)

(proof to be inserted later). The separation condition for ρ U\rho_U is usually not satisfied and anyway will not concern us.

Now suppose given two TVS, XX and YY, which come equipped with gauge functions ρ U\rho_U and ρ V\rho_V respectively, and for which there is a bilinear pairing

,:X×Y\langle -, -\rangle: X \times Y \to \mathbb{R}

satisfying a Hölder-type inequality

|x,y|ρ U(x)ρ V(y)|\langle x, y \rangle| \leq \rho_U(x)\rho_V(y)

which is sharp in the sense that given xXx \in X (resp., yYy \in Y), there exists yYy \in Y (resp., xXx \in X) for which the Hölder inequality is an equality.

Notice that the Hölder inequality implies that for each xXx \in X, the functional x,:Y\langle x, -\rangle: Y \to \mathbb{R} is uniformly continuous with respect to the ρ V\rho_V-topology, with Lipschitz constant ρ U(x)\rho_U(x). Similarly if we interchange the roles of XX and YY.

As always, we define the unit ball (in appropriate gauge) by

B U(1)={xX:ρ U(x)1}B_U(1) = \{x \in X: \rho_U(x) \leq 1\}

The unit ball is convex if and only if ρ U\rho_U satisfies the Minkowski or triangle inequality: ρ U(x+x)ρ U(x)+ρ U(x)\rho_U(x + x' ) \leq \rho_U(x) + \rho_U(x').


Under the sharp Hölder inequality, the unit balls in XX and YY are convex.


By the Hölder inequality, we have the inclusion

B U(1) y:ρ V(y)=1H yB_U(1) \subseteq \bigcap_{y: \rho_V(y) = 1} H_y

where H yH_y denotes the affine half-space

{xX:x,y1}.\{x \in X: \langle x, y \rangle \leq 1\}.

We also have the reverse inclusion because the Hölder inequality is sharp. In more detail: suppose xB U(1)x \notin B_U(1), i.e., that ρ U(x)=c>1\rho_U(x) = c \gt 1. We may choose yy such that ρ V(y)=1\rho_V(y) = 1 and x,y=c>1\langle x, y \rangle = c \gt 1. But then xH yx \notin H_y, so xx does not belong to the intersection of the affine hyperspaces above.

Thus the inclusion is an equality. Therefore, the unit ball B U(1)B_U(1), being an intersection of affine half-spaces, is convex.

Reprising the proof of the Minkowski inequality from Hölder’s inequality

Armed with this conceptual explanation of the relation between duality and convexity, let’s revisit the classical derivation of Minkowski’s inequality from Hölder’s inequality.

First, we should verify that the Hölder inequality between L pL^p and L qL^q, where 1<p,q<1 \lt p, q \lt \infty and 1p+1q=1\frac1{p} + \frac1{q} = 1, is sharp in the sense described above.

A special case: suppose ff is a function taking values in [0,][0, \infty] such that |f| p=1|f|_p = 1. Put g=f p/q=f p1g = f^{p/q} = f^{p-1}. Then |g| q=|f| p|g|^q = |f|^p, so |g| q=1|g|_q = 1, and also

fg=|f| p/p|f| p/q=|f| p(1/p+1/q)=|f| p=1(1)\int f \cdot g = \int |f|^{p/p}|f|^{p/q} = \int |f|^{p(1/p + 1/q)} = \int |f|^p = 1 \qquad (1)

whence sharpness follows.

For the more general case, removing the assumption that ff is valued in [0,][0, \infty], let gg be the function such that fg=|f| pf g = |f|^p whenever ff is nonzero, and g=0g = 0 whenever f=0f = 0. Then |g| q=|f| q(p1)=|f| p|g|^q = |f|^{q(p-1)} = |f|^p, so |g| q=1|g|_q = 1 and the sharpness of the Hölder inequality goes through as in (1) above.

The Minkowski inequality, or equivalently the convexity of the unit ball in L pL^p, therefore follows by applying the theorem above.

Now the usual proof of Hölder \Rightarrow Minkowski runs something like this: let f,gL pf, g \in L^p. We know f+gL pf + g \in L^p (proof: |f+g2| p12|f| p+12|g| p\vert \frac{f+g}{2} \vert^p \leq \frac1{2}\vert f \vert^p + \frac1{2}\vert g \vert^p – convexity of the function xx px \mapsto x^p again!). By rescaling, we may assume without loss of generality that |f+g| p=1\vert f+g \vert_p = 1. Then

1= X|f+g| pdμ X|f||f+g| p1+ X|g||f+g| p1 |f| p|h| q+|g| p|h| q\array{ 1 = \int_X \vert f+g \vert^p d\mu & \leq & \int_X \vert f \vert \cdot \vert f+g \vert^{p-1} + \int_X \vert g \vert \cdot \vert f+g \vert^{p-1} \\ & \leq & \vert f \vert_p \vert h \vert_q + \vert g \vert_p \vert h \vert_q }

where h=|f+g| p1h = \vert f+g \vert^{p-1}. Since |h| q=|f+g| p=1\vert h \vert_q = \vert f+g \vert_p = 1, we have |f+g| p=1|f| p+|g| p\vert f+g \vert_p = 1 \leq \vert f \vert_p + \vert g \vert_p, as desired.

Revised on April 4, 2018 at 17:56:24 by Todd Trimble