nLab Kendall tau distance

Contents

Context

Group Theory

Analysis

Idea
Definition
Properties
Related concepts
References

Idea

An adaptation of the Kendall tau rank correlation?, the Kendall tau distance between two permutations is the minimum number of transpositions (swaps) of adjacent elements needed to turn one into the other. This quantity can also be defined as the total number of discordant pairs? between the two permutations.

(This is in contrast to the Cayley distance which counts the minimum number of transpositions of any pairs of elements, not necessarily adjacent.)

This quantity is also referred to as the bubble sort distance, since bubble sorting algorithms repeatedly iterate over a list and swap adjacent elements if they are out of order.

Definition

For $n \in \mathbb{N}$ consider the symmetric group $Sym(n)$ with its structure as a finitely generated group exhibited by the finite set of generators given by the adjacent transposition permutations

Gen \;\coloneqq\; \big\{ \sigma_{i,i+1} \,\vert\, 1 \leq i \lt n \big\} \;\subset\; Sym(n) \,.

Then the Kendall tau distance (Kendall 1938, see also Diaconis 88, p. 112) is the distance on $Sym(n)$ which is the graph distance of the corresponding Cayley graph, hence the function

d_K(-,-) \;\colon\; Sym(n) \times Sym(n) \longrightarrow \mathbb{N} \hookrightarrow \mathbb{R}

which sends any pair of permutations $\sigma_1, \sigma_2 \in Sym(n)$ to the minimum number $d(\sigma_1, \sigma_2) \in \mathbb{N}$ of adjacent transpositions $\sigma_{i_1, i_1 + 1}, \sigma_{i_2, i_2 + 1}, \cdots, \sigma_{i_d, i_d+1}$ such that

\sigma_2 \;=\; \sigma_{i_d, i_d+1} \circ \cdots \circ \sigma_{i_2, i_2 + 1} \circ \sigma_{i_1, i_1 + 1} \circ \sigma_1 \,.

For $\beta \in \mathbb{R}_{\geq 0}$ , the exponential of this distance weighted by $- \beta$ is known as the Mallows kernel:

(1)

(\sigma_1,\sigma_2) \;\mapsto\; \exp \big( - \beta d_K(\sigma_1, \sigma_2) \big)

Properties

Theorem

The Mallows kernel (1) is positive definite for all $\beta \geq 0$ .

(Jiao-Vert 18, Theorem 1)

Related concepts

References

The distance function is due to:

Maurice Kendall, A new measure of rank correlation, Biometrika, Volume 30, Issue 1-2, June 1938, Pages 81–93 (doi:10.1093/biomet/30.1-2.81)

The corresponding exponential kernel is named after

C. L. Mallows, Non-Null Ranking Models. I, Biometrika Vol. 44, No. 1/2 (Jun., 1957), pp. 114-130 (jstor:2333244)

Textbook account:

Persi Diaconis, Chapter 6B of: Group Representations in Probability and Statistics, IMS Lecture Notes Monogr. Ser., 11: 198pp. (1988) (jstor:i397389, ISBN: 0940600145, pdf)