nLab chain rule



Differential geometry

synthetic differential geometry


from point-set topology to differentiable manifolds

geometry of physics: coordinate systems, smooth spaces, manifolds, smooth homotopy types, supergeometry



smooth space


The magic algebraic facts




infinitesimal cohesion

tangent cohesion

differential cohesion

graded differential cohesion

singular cohesion

id id fermionic bosonic bosonic Rh rheonomic reduced infinitesimal infinitesimal & étale cohesive ʃ discrete discrete continuous * \array{ && id &\dashv& id \\ && \vee && \vee \\ &\stackrel{fermionic}{}& \rightrightarrows &\dashv& \rightsquigarrow & \stackrel{bosonic}{} \\ && \bot && \bot \\ &\stackrel{bosonic}{} & \rightsquigarrow &\dashv& \mathrm{R}\!\!\mathrm{h} & \stackrel{rheonomic}{} \\ && \vee && \vee \\ &\stackrel{reduced}{} & \Re &\dashv& \Im & \stackrel{infinitesimal}{} \\ && \bot && \bot \\ &\stackrel{infinitesimal}{}& \Im &\dashv& \& & \stackrel{\text{étale}}{} \\ && \vee && \vee \\ &\stackrel{cohesive}{}& \esh &\dashv& \flat & \stackrel{discrete}{} \\ && \bot && \bot \\ &\stackrel{discrete}{}& \flat &\dashv& \sharp & \stackrel{continuous}{} \\ && \vee && \vee \\ && \emptyset &\dashv& \ast }


Lie theory, ∞-Lie theory

differential equations, variational calculus

Chern-Weil theory, ∞-Chern-Weil theory

Cartan geometry (super, higher)



The chain rule is the statement that differentiation d:DiffDiffd : Diff \to Diff is a functor on Diff:

given two smooth functions between smooth manifolds f:XYf : X \to Y and g:YZg : Y \to Z we have

d(gf):TXdfTYdgTZ. d(g \circ f) : T X \stackrel{d f}{\to} T Y \stackrel{d g}{\to} T Z \,.

If one thinks of a tangent vector vT xXv \in T_x X to be an equivalence class of a smooth path γ v:[ϵ,ϵ]X\gamma_v : [-\epsilon,\epsilon] \to X, for some ϵ>0\epsilon\gt 0, with γ(0)=x\gamma(0) = x, then the chain rule is the associativity of the composite

[ϵ,ϵ]γ vXfYgZ. [-\epsilon,\epsilon] \stackrel{\gamma_v}{\to} X \stackrel{f}{\to} Y \stackrel{g}{\to} Z \,.

Bracketed as (gf)γ v(g \circ f)\circ \gamma_v this represents d(gf)(v)d(g \circ f)(v). Bracketed as g(fγ v)g \circ (f \circ \gamma_v) is represents dg(df(v))d g (d f (v)).

Alternatively, in a context of synthetic differential geometry where with DD being the infinitesimal interval we may identify vv with v:DXv : D \to X, the chain rule is the associativity of

DvXfYgZ. D \stackrel{v}{\to} X \stackrel{f}{\to} Y \stackrel{g}{\to} Z \,.


Elementary calculus

Let X=Y=Z=X = Y = Z = \mathbb{R} the real line. Then the tangent bundle TXT X is canonically identified with ×\mathbb{R} \times \mathbb{R}.

Given two functions, f,g:f, g : \mathbb{R} \to \mathbb{R} their derivatives are traditionally regarded again as functions f,g:f', g' : \mathbb{R} \to \mathbb{R}, though strictly speaking we are to think of them as the maps

df,dg:×=TT=× d f, d g : \mathbb{R} \times \mathbb{R} = T \mathbb{R} \to T \mathbb{R} = \mathbb{R} \times \mathbb{R}

given by

df:(x,v)(f(x),vf(x)) d f : (x,v) \mapsto (f(x), v f'(x))


dg:(x,v)(g(x),vg(x)). d g : (x,v) \mapsto (g(x), v g'(x)) \,.

The composite

d(gf):×=TT=× d (g \circ f) : \mathbb{R} \times \mathbb{R} = T \mathbb{R} \to T \mathbb{R} = \mathbb{R} \times \mathbb{R}

is therefore the map

d(gf):(x,v)(f(x),vf(x))(g(f(x)),vf(x)g(f(x))). d(g \circ f) : (x,v) \mapsto (f(x), v f'(x)) \mapsto (g(f(x)), v f'(x) g'(f(x))) \,.

Therefore we have

(gf)(x)=f(x)g(f(x)). (g \circ f)'(x) = f'(x) g'(f(x)) \,.

This is the form in which the chain rule is usually introduced in elementary calculus.


While the chain rule is of great theoretical importance, it is completely unnecessary for the working out of derivatives of elementary functions in ordinary calculus (including the multivariable case). Every result giving the derivative of an elementary function corresponds to a rule (along the lines of the product rule) for the operation of that function to any expression. For example, instead of learning that

ddx(sinx)=cosx \frac{\mathrm{d}}{\mathrm{d}x} (\sin x) = \cos x

and then applying both this fact and the chain rule to find the derivative of an expression of the form sinu\sin u, just learn

ddx(sinu)=cosududx \frac{\mathrm{d}}{\mathrm{d}x} (\sin u) = \cos u \,\frac{\mathrm{d}u}{\mathrm{d}x}

and apply this rule directly; the original fact is the special case in which uxu \coloneqq x. Even better, learn the rule as

d(sinu)=cosudu; \mathrm{d}(\sin u) = \cos u \,\mathrm{d}u ;

then it applies without further modification to multivariable calculus (as well as implicit differentiation, related rates, integration by substitution, and other stock features of one-variable calculus).

The chain rule could still be used in the proof of this ‘sine rule’. Even so, it is quite possible to prove the sine rule directly (much as one proves the product rule directly rather than using the two-variable chain rule and the partial derivatives of the function x,yxyx, y \mapsto x y). In any case, the chain rule is not directly needed when working out specific derivatives. As a rule of differentiation, the chain rule is needed only when an unspecified differentiable function ff appears, and then may be given in the form

ddx(f(u))=f(u)dudx \frac{\mathrm{d}}{\mathrm{d}x} \bigl(f(u)\bigr) = f'(u) \,\frac{\mathrm{d}u}{\mathrm{d}x}


d(f(u))=f(u)du \mathrm{d}\bigl(f(u)\bigr) = f'(u) \,\mathrm{d}u

to match other rules.

Although the chain rule is often written as

(1)dydx=dydududx, \frac{\mathrm{d}y}{\mathrm{d}x} = \frac{\mathrm{d}y}{\mathrm{d}u} \, \frac{\mathrm{d}u}{\mathrm{d}x} ,

this is an oversimplification. Upon choosing an independent variable, it is possible (and easy) to give a rigorous definition of the differential du\mathrm{d}u, and then (1) is a triviality (assuming no division by zero). However, with such a definition of differential, (1) is not the chain rule! The reason is that, when (1) is used as a mnemonic for the chain rule, du/dx{\mathrm{d}u}/{\mathrm{d}x} uses xx as the independent variable, but dy/du{\mathrm{d}y}/{\mathrm{d}u} uses uu. Either may be chosen to define differentials1, but one must (a priori) be consistent. Now, it so happens that the choice of independent variable is entirely irrelevant; differentials have the same meaning no matter which independent variable is used. But this fact requires proof; it is the chain rule (or at least a prerequisite for using (1) as the chain rule), and it is not a triviality. In this form, the chain rule is also known as Cauchy’s invariant rule.

Formal algebra

The chain rule can also be discussed as a piece of formal algebra of power series (over a general commutative ring AA). We present a conceptual proof based on considerations of SDG.

The statement is that if q,pA[[x]]q, p \in A[ [x] ] are power series, with 00 the 0 th0^{th} (constant) coefficient of pp, then (qp)(x)=q(p(x))p(x)(q \circ p)'(x) = q'(p(x))p'(x) under standard definitions.

Let D=A[y]/(y 2)D = A[y]/(y^2) be the representing object for derivations. Let δ:A[[x]]A[[x]] ADA[[x]][y]/(y 2)\delta: A[ [x] ] \to A[ [x] ] \otimes_A D \cong A[ [x] ][y]/(y^2) be the unique topological AA-algebra map (under the (x)(x)-adic topologies) that sends xx to x+yx + y. (If it helps, think δ(q)=q(x+y)\delta(q) = q(x + y).) For pA[[x]]p \in A[ [x] ], define pp' via the equation δ(p)=p(x)+p(x)y\delta(p) = p(x) + p'(x)y.

Let π:A[[x]] ADA[[x]] AD\pi: A[ [x] ] \otimes_A D \to A[ [x] ] \otimes_A D be the unique topological algebra map taking xx to p(x)p(x) and yy to p(x)yp'(x)y. Let p:A[[x]]A[[x]]- \circ p: A[ [x] ] \to A[ [x] ] denote the unique topological algebra map that takes xx to pp. Then the diagram

A[[x]] δ A[[x]] AD p π A[[x]] δ A[[x]] AD\array{ A[ [x] ] & \stackrel{\delta}{\to} & A[ [x] ] \otimes_A D \\ \mathllap{- \circ p} \downarrow & & \downarrow \mathrlap{\pi} \\ A[ [x] ] & \underset{\delta}{\to} & A[ [x] ] \otimes_A D }

commutes in the category of topological algebras, since the two legs agree when evaluated at the generator xx. But then, evaluating each leg at a power series qA[[x]]q \in A [ [x]], we have

[δ(p)](q)=δ(qp)=(qp)(x)+(qp)(x)y[\delta(- \circ p)](q) = \delta(q \circ p) = (q \circ p)(x) + (q \circ p)'(x)y


[πδ](q)=π(δ(q))=π(q(x)+q(x)y)=q(p(x))+q(p(x))(p(x)y)[\pi \delta](q) = \pi(\delta(q)) = \pi(q(x) + q'(x)y) = q(p(x)) + q'(p(x))(p'(x)y)

whence the coefficients of yy agree: (qp)(x)=q(p(x))p(x)(q \circ p)'(x) = q'(p(x))p'(x).


A synthetic formalization of the chain rule in differential cohesive homotopy type theory is given in

See also

On the Goodwillie chain rule in Goodwillie calculus:

  1. Well, either may be chosen as long as their differentials are nowhere zero, which is exactly what must be true for (1) to make sense. More precisely, given that xx works as an independent variable, the Chain Rule tells us that uu works just as well so long as du\mathrm{d}u (as defined using xx) is nowhere zero. This may be related to the easy but wrong proof of the Chain Rule that founders on a division by zero in exactly the same place (where du\mathrm{d}u would be), although I don’t see a direct connection.

Last revised on May 26, 2023 at 21:21:23. See the history of this page for a list of all contributions to it.