nLab Gram-Schmidt process

Contents

Context

Linear algebra

homotopy theory, (∞,1)-category theory, homotopy type theory

flavors: stable, equivariant, rational, p-adic, proper, geometric, cohesive, directed

models: topological, simplicial, localic, …

see also algebraic topology

Introductions

Definitions

Paths and cylinders

Homotopy groups

Basic facts

Theorems

Contents

Idea

The Gram–Schmidt process is an algorithm which takes as input an ordered basis of an inner product space and produces as output an ordered orthonormal basis.

In terms of matrices, the Gram–Schmidt process is a procedure of factorization of a invertible matrix MM in the general linear group GL n()GL_n(\mathbb{R}) (or GL n()GL_n(\mathbb{C})) as a product M=UTM = U T where TT is an upper triangular matrix and UU is an orthonormal (or unitary) matrix; as such it is a special case of the more general Iwasawa decomposition for a (connected) semisimple Lie group.

Since the factorization depends smoothly on the parameters, the Gram–Schmidt procedure enables the reduction of the structure group of an inner product vector bundle (e.g., the tangent bundle of a Riemannian manifold or a Kähler manifold) from GL nGL_n to the orthogonal group O nO_n (or the unitary group U nU_n).

Gram–Schmidt process on Hilbert spaces

In this section, “basis” is understood to signify an ordered independent set whose linear span is dense in a Hilbert space HH seen as a metric space. We will describe the Gram–Schmidt process as applied to a dd-dimensional Hilbert space for some cardinal dd with a basis v 0,v 1,v_0, v_1, \ldots consisting of dd vectors.

The orthonormal basis u 0,u 1,u_0, u_1, \ldots produced as output is defined recursively by a) subtracting the orthogonal projection to the closed subspace generated by all previous vectors and b) normalizing. We denote the orthogonal projection onto a closed subspace AA by π A:HA\pi_A\colon H\to A and the normalization v/vv/\|v\| of a vector vHv \in H by N(v)N(v). For ordinals α<d\alpha \lt d define

u αN(v απ S α(v α))u_\alpha \coloneqq N\left(v_\alpha - \pi_{S_\alpha}(v_\alpha)\right)

where S αS_\alpha is the closure of the span of {v β:β<α}\{v_\beta: \beta \lt \alpha\}, noting that the projection π S α\pi_{S_\alpha} is known to exist, since HH is complete. This can be rewritten more explicitly using transfinite recursion as

u α=N(v α β<αv α,u βu β)u_\alpha = N\left(v_\alpha - \sum_{\beta \lt \alpha} \langle v_\alpha, u_\beta\rangle u_\beta\right)

where the sum on the right is well defined by the Bessel inequality, i.e. only countably many coefficients are non-zero and they are square-summable. A simple (transfinite) inductive argument shows that the u αu_\alpha are unit vectors orthogonal to each other, and that the span of {u β:β<α}\left\{u_\beta \colon \beta \lt \alpha\right\} is equal to the span of {v β:β<α}\left\{v_\beta \colon \beta \lt \alpha\right\} for αd\alpha \leq d. Therefore u 0,u 1,u_0, u_1, \ldots is an orthonormal basis of HH.

Remark

(Application to non-bases)

If we apply the Gram–Schmidt process to a well-ordered independent set whose closed linear span SS is not all of HH, we still get an orthonormal basis of the subspace SS. If we apply the Gram–Schmidt process to a dependent set, then we will eventually run into a vector vv whose norm is zero, so we will not be able to take N(v)N(v). In that case, however, we can simply remove vv from the set and continue; then we will still get an orthonormal basis of the closed linear span. (This conclusion is not generally valid in constructive mathematics, since it relies on excluded middle applied to the statement that v0\|v\| \neq 0. However, it does work to discrete fields, such as the algebraic closure of the rationals, as seen in elementary undergraduate linear algebra.)

Via Gaussian elimination

There is an alternative algorithm via Gaussian elimination which for a set of vectors produces an orthogonal basis for its linear span (Pursell-Trimble 91).

In the special case that the original vectors are linearly independent and after normalizing the resulting orthogonal basis to an orthonormal basis, the output of this algorithm is an orthonormal basis as produced also by the Gram-Schmidt process.

Lemma

(LU-decomposition of positive semidefinite matrices)

Every positive semidefinite matrix MM has a matrix product-decomposition

M=LU M \;=\; L \cdot U

where

  1. LL is

    1. a lower triangular matrix

    2. an invertible matrix

  2. UU is an upper triangular matrix.

(Pursell-Trimble 91, p. 4)

Proposition

(orthogonal basis of linear span via LU-decomposition)

Let (a j)(a_j) be a tuple of vectors of the same length, hence let

A=(A ij)((a j) i) A = (A_{i j}) \coloneqq ((a_j)_i)

be the matrix whose jjth column is a ja_j.

Since the matrix product MA TAM \coloneqq A^T \cdot A of AA with its transpose matrix is necessarily a positive semidefinite symmetric matrix it has an LU-decomposition according to Lemma :

A TA=LU. A^T \cdot A \;=\; L \cdot U \,.

Then the column vectors (q^ j)(\hat q_j) of the matrix

Q^A(L 1) T \widehat Q \;\coloneqq\; A \cdot (L^{-1})^T

constitute an orthogonalization of the original tuple of vectors (a i)(a_i) in that the non-zero q^ j\hat q_j form an orthogonal linear basis of the linear span of the (a j)(a_j).

(Pursell-Trimble 91, top of p. 5)

Remark

That the matrix LL in Prop. is lower triangular and invertible (by Lemma ) means that its inverse matrix L 1L^{-1} is also a lower triangular matrix which reflects a process of row reduction from A TAA^T \cdot A to UU.

Accordingly, the orthogonalization in Prop. may be understood as applying Gauss elimination to

[A TA|A T][Q^ T] \big[A^T \cdot A \vert A^T \big] \;\mapsto\; \big[ \widehat Q^T\big]

(Pursell-Trimble 91, p. 3)

Examples

Legendre polynomials

A classic illustration of Gram–Schmidt is the production of the Legendre polynomials.

Let HH be the Hilbert space H=L 2([1,1])H = L^2([-1, 1]) of square integrable functions in the closed interval, equipped with the standard inner product defined by

f,g= 1 1f(x)¯g(x)dx\langle f, g\rangle = \int_{-1}^1 \widebar{f(x)} g(x) d x

By the Stone-Weierstrass theorem, the space of polynomials [x]\mathbb{C}[x] is dense in HH according to its standard inclusion, and so the polynomials 1,x,x 2,1, x, x^2, \ldots form an ordered basis of HH.

Applying the Gram–Schmidt process as above, one readily computes the first few orthonormal functions:

u 1(x)=N(1)=1/2u_1(x) = N(1) = 1/2
u 2(x)=N(x0)=3/2xu_2(x) = N(x - 0) = \sqrt{3/2} x
u 3(x)=N(x 2x 2,1/21/20)=N(x 21/3)=35/2/2(x 21/3)u_3(x) = N(x^2 - \langle x^2, 1/2\rangle 1/2 - 0) = N(x^2 - 1/3) = 3\sqrt{5/2}/2(x^2 - 1/3)

The classical Legendre polynomials P n(x)P_n(x) are scalar multiplies of the functions u nu_n, adjusted so that P n(1)=1P_n(1) = 1; they satisfy the orthogonality relations

P n,P m=22n+1δ m,n\langle P_n, P_m\rangle = \frac{2}{2n + 1}\delta_{m, n}

where δ m,n\delta_{m, n} is the Kronecker delta.

Categorified Gram–Schmidt process

Many aspects of the Gram–Schmidt process can be categorified so as to apply to 2-Hilbert spaces as indicated at Schur's lemma in the section In terms of categorical algebra;

We will illustrate the basic idea with an example that was suggested to us by James Dolan. (See also at permutation representation the sections Examples – Virtual permutation representations.)

Consider the category of representations S nRepS_n Rep over the complex numbers of the symmetric group GS nG \coloneqq S_n. (As a running example, we consider S 4S_4; up to isomorphism, there are five irreducible representations

U (4),U (31),U (22),U (211),U (1111)U_{(4)}, \, U_{(3 1)}, \, U_{(2 2)}, \, U_{(2 1 1)}, \, U_{(1 1 1 1)}

classified by the five Young diagrams of size 4. To save space, we denote these as U 1U_1, U 2U_2, U 3U_3, U 4U_4, U 5U_5.)

The irreducible representations U iU_i of S nS_n form a 22-orthonormal basis in the sense that any two of them U i,U jU_i, U_j satisfy the relation

hom G(U i,U j)δ ijhom_G(U_i, U_j) \cong \delta_{i j} \cdot \mathbb{C}

(where hom Ghom_G denotes the hom vector space in S nRepS_n Rep, and nn \cdot \mathbb{C} indicates a direct sum of nn copies of \mathbb{C}). In fact, the irreducible representations are uniquely determined up to isomorphism by these relations.

There is however another way of associating representations to partitions or Young diagrams. Namely, consider the subgroup of permutations which take each row of a Young diagram or Young tableau of size nn to itself; this forms a parabolic subgroup of S nS_n, conjugate to one of type P (n 1n k)=S n 1××S n kP_{(n_1 \ldots n_k)} = S_{n_1} \times \ldots \times S_{n_k} where n in_i is the length of the i thi^{th} row of the Young diagram. The group S nS_n acts transitively on the orbit space of cosets

S n/P (n 1n k)S_n/P_{(n_1 \ldots n_k)}

and these actions give linear permutation representations [S n/P (n 1n k)]\mathbb{C}\big[S_n/P_{(n_1 \ldots n_k)}\big] of S nS_n. Equivalently, these are linear representations V iV_i which are induced from the trivial representation along inclusions of parabolic subgroups.

We claim that

  1. these representations form a \mathbb{Z}-linear basis of the representation ring R(S n)R(S_n),

  2. we may calculate their characters using a categorified Gram–Schmidt process.

We indicate the proof:

Given two such parabolic subgroups PP, QQ in G=S nG = S_n, the 22-inner product

hom G([G/P],[G/Q])hom_G(\mathbb{C}[G/P], \mathbb{C}[G/Q])

may be identified with the free vector space on the set of double cosets P\G/QP\backslash G/Q.

One may count the number of double cosets by hand in a simple case like G=S 4G = S_4. That is, for the 5 representations V 1,,V 5V_1, \ldots, V_5 induced from the 5 parabolic subgroups P iP_i corresponding to the 5 Young diagrams listed above, the dimensions of the 2-inner products hom(V i,V j)hom(V_i, V_j) are the sizes of the corresponding double coset spaces P i\S 4/P jP_i\backslash S_4 /P_j. These numbers form a matrix as follows (following the order of the 55 partitions listed above):

(1 1 1 1 1 1 2 2 3 4 1 2 3 4 6 1 3 4 7 12 1 4 6 12 24)\left( \array {1 & 1 & 1 & 1 & 1 \\ 1 & 2 & 2 & 3 & 4 \\ 1 & 2 & 3 & 4 & 6 \\ 1 & 3 & 4 & 7 & 12 \\ 1 & 4 & 6 & 12 & 24 }\right)

To reiterate: this matrix is the decategorification (a matrix of dimensions) of a matrix of 22-inner products where the (ij)(i j)-entry is of the form

hom G(V i,V j)V i * GV jhom_G(V_i, V_j) \cong V_i^* \otimes_G V_j

where the V iV_i are induced from inclusions of parabolic subgroups. The V iV_i are \mathbb{N}-linear combinations of irreducible representations U iU_i which form a 22-orthonormal basis, and we may perform a series of elementary row operations which convert this matrix into an upper triangular matrix, and which will turn out to be the decategorified form of the 2-matrix with entries

hom G(U i,V j)U i * GV jhom_G(U_i, V_j) \cong U_i^* \otimes_G V_j

where U iU_i is the irrep corresponding to the iith Young diagram (as listed above). The upper triangular matrix is

(1 1 1 1 1 0 1 1 2 3 0 0 1 1 2 0 0 0 1 3 0 0 0 0 1)\left( \array {1 & 1 & 1 & 1 & 1 \\ 0 & 1 & 1 & 2 & 3 \\ 0 & 0 & 1 & 1 & 2 \\ 0 & 0 & 0 & 1 & 3 \\ 0 & 0 & 0 & 0 & 1} \right)

and we read off from the columns the following decompositions into irreducible components:

V 1U 1V_1 \cong U_1
V 2U 1+U 2V_2 \cong U_1 + U_2
V 3U 1+U 2+U 3V_3 \cong U_1 + U_2 + U_3
V 4U 1+2U 2+U 3+U 4V_4 \cong U_1 + 2 U_2 + U_3 + U_4
V 5U 1+3U 2+2U 3+3U 4+U 5V_5 \cong U_1 + 3 U_2 + 2 U_3 + 3 U_4 + U_5

The last representation V 5V_5 is the regular representation of S 4S_4 (because the parabolic subgroup is trivial). Since we know from general theory that the multiplicity of the irreducible U iU_i in the regular representation is its dimension, we get as a by-product the dimensions of the U iU_i from the expression for V 5V_5:

dim(U 1)=1,dim(U 2)=3,dim(U 3)=2,dim(U 4)=3,dim(U 5)=1dim(U_1) = 1, \, dim(U_2) = 3, \, dim(U_3) = 2, \, dim(U_4) = 3, \, dim(U_5) = 1

(the first of the U iU_i is the trivial representation, and the last U 5U_5 is the alternating representation).

The row operations themselves can be assembled as the lower triangular matrix

(1 0 0 0 0 1 1 0 0 0 0 1 1 0 0 1 1 1 1 0 1 2 1 3 1) \left( \array {1 & 0 & 0 & 0 & 0 \\ -1 & 1 & 0 & 0 & 0 \\ 0 & -1 & 1 & 0 & 0 \\ 1 & -1 & -1 & 1 & 0 \\ -1 & 2 & 1 & -3 & 1 } \right)

and from the rows we read off the irreducible representations as “virtual” (i.e., \mathbb{Z}-linear) combinations of the parabolically induced representations V iV_i:

U 1V 1U_1 \cong V_1
U 2V 1+V 2U_2 \cong -V_1 + V_2
U 3V 2+V 3U_3 \cong -V_2 + V_3
U 4V 1V 2V 3+V 4U_4 \cong V_1 - V_2 - V_3 + V_4
U 5V 1+2V 2+V 33V 4+V 5U_5 \cong -V_1 + 2V_2 + V_3 - 3 V_4 + V_5

which can be considered the result of the categorified Gram–Schmidt process.

It follows from these representations that the V iV_i form a \mathbb{Z}-linear basis of the representation ring Rep(S 4)Rep(S_4).

Analogous statements hold for each symmetric group S nS_n.

References

See also

The formulation via Gaussian elimination is due to

  • Lyle Pursell, S. Y. Trimble, Gram-Schmidt orthogonalization by Gauss Elimination, The American Mathematical Monthly Vol. 98, No. 6 (1991), pp. 544-549 (jstor:2324877)

Last revised on October 19, 2022 at 07:30:11. See the history of this page for a list of all contributions to it.