nLab inner product of multisets

Contents

Contents

Definition

A multiset 𝒳=⟨X,ΞΌ X⟩\mathcal{X} = \langle X,\mu_X\rangle consists of a set XX and a function ΞΌ X:Uβ†’β„•\mu_X:U\to\mathbb{N}, where UU is a universal set and ΞΌ X(e)>0\mu_X(e) \gt 0 if and only if e∈Xe\in X.

We can add two multisets 𝒳=⟨X,ΞΌ X⟩\mathcal{X} = \langle X,\mu_X\rangle and 𝒴=⟨Y,ΞΌ Y⟩\mathcal{Y} = \langle Y,\mu_Y\rangle to get

𝒳+𝒴=⟨XβˆͺY,ΞΌ X+ΞΌ Y⟩.\mathcal{X} + \mathcal{Y} = \langle X\cup Y,\mu_X+\mu_Y\rangle.

Note that we can write

k𝒳=⟨X,kΞΌ X⟩k\mathcal{X} = \langle X,k\mu_X\rangle

for kβˆˆβ„•k\in\mathbb{N}.

We can also define an inner product of multisets via

βŸ¨π’³,π’΄βŸ©=βˆ‘ eβˆˆπ’³βˆͺ𝒴μ X(e)ΞΌ Y(e).\langle \mathcal{X},\mathcal{Y}\rangle = \sum_{e\in\mathcal{X}\cup \mathcal{Y}} \mu_X(e) \mu_Y(e).

Note that when 𝒳\mathcal{X} and 𝒴\mathcal{Y} are simply sets, ΞΌ X\mu_X and ΞΌ Y\mu_Y are the characteristic functions and

βŸ¨π’³,π’΄βŸ©=|X∩Y|,\langle \mathcal{X},\mathcal{Y}\rangle = |X\cap Y|,

where |β‹…||\cdot| denotes the cardinality of the set.

Using this inner product, we can define the angle between multisets as

cosΞΈ 𝒳,𝒴=βŸ¨π’³,π’΄βŸ©βŸ¨π’³,π’³βŸ©βŸ¨π’΄,π’΄βŸ©.\cos\theta_{\mathcal{X},\mathcal{Y}} = \frac{\langle \mathcal{X},\mathcal{Y}\rangle}{\sqrt{\langle \mathcal{X},\mathcal{X}\rangle \langle \mathcal{Y},\mathcal{Y}\rangle}}.

In particular, when 𝒳=𝒴\mathcal{X}=\mathcal{Y} we have

cosΞΈ 𝒳,𝒴=1cos\theta_{{\mathcal{X}},{\mathcal{Y}}} = 1

and when X∩Y=βˆ…X\cap Y = \emptyset we have

cosΞΈ 𝒳,𝒴=0.cos\theta_{{\mathcal{X}},{\mathcal{Y}}} = 0.

When 𝒳\mathcal{X} and 𝒴\mathcal{Y} are simply sets, the angle between them is given by

cosΞΈ 𝒳,𝒴=|X∩Y||X||Y|.cos\theta_{{\mathcal{X}},{\mathcal{Y}}} = \frac{|X\cap Y|}{\sqrt{|X||Y|}}.

With this notion of addition, the collection of multisets in UU becomes the β„•\mathbb{N}-module (that is abelian monoid) β„• U\mathbb{N}^U; this inner product makes it an inner product space analogous to the Banach space ℝ n\mathbb{R}^n.

Machine Learning

The inner product of multisets is closely related to the β€œbag of words” kernel in machine learning (see n-Cafe).

Last revised on April 2, 2021 at 02:19:50. See the history of this page for a list of all contributions to it.