# Contents

## Definition

A multiset $𝒳=⟨X,{\mu }_{X}⟩$ consists of a set $X$ and a function ${\mu }_{X}:U\to ℕ$, where $U$ is a universal set and ${\mu }_{X}\left(e\right)>0$ if and only if $e\in X$.

We can add two multisets $𝒳=⟨X,{\mu }_{X}⟩$ and $𝒴=⟨Y,{\mu }_{Y}⟩$ to get

$𝒳+𝒴=⟨X\cup Y,{\mu }_{X}+{\mu }_{Y}⟩.$\mathcal{X} + \mathcal{Y} = \langle X\cup Y,\mu_X+\mu_Y\rangle.

Note that we can write

$k𝒳=⟨X,k{\mu }_{X}⟩$k\mathcal{X} = \langle X,k\mu_X\rangle

for $k\in ℕ$.

We can also define an inner product of multisets via

$⟨𝒳,𝒴⟩=\sum _{e\in 𝒳\cup 𝒴}{\mu }_{X}\left(e\right){\mu }_{Y}\left(e\right).$\langle \mathcal{X},\mathcal{Y}\rangle = \sum_{e\in\mathcal{X}\cup \mathcal{Y}} \mu_X(e) \mu_Y(e).

Note that when $𝒳$ and $𝒴$ are simply sets, ${\mu }_{X}$ and ${\mu }_{Y}$ are the characteristic functions and

$⟨𝒳,𝒴⟩=\mid X\cap Y\mid ,$\langle \mathcal{X},\mathcal{Y}\rangle = |X\cap Y|,

where $\mid \cdot \mid$ denotes the cardinality of the set.

Using this inner product, we can define the angle between multisets as

$\mathrm{cos}{\theta }_{𝒳,𝒴}=\frac{⟨𝒳,𝒴⟩}{\sqrt{⟨𝒳,𝒳⟩⟨𝒴,𝒴⟩}}.$\cos\theta_{\mathcal{X},\mathcal{Y}} = \frac{\langle \mathcal{X},\mathcal{Y}\rangle}{\sqrt{\langle \mathcal{X},\mathcal{X}\rangle \langle \mathcal{Y},\mathcal{Y}\rangle}}.

In particular, when $𝒳=𝒴$ we have

$\mathrm{cos}{\theta }_{𝒳,𝒴}=1$cos\theta_{{\mathcal{X}},{\mathcal{Y}}} = 1

and when $X\cap Y=\varnothing$ we have

$\mathrm{cos}{\theta }_{𝒳,𝒴}=0.$cos\theta_{{\mathcal{X}},{\mathcal{Y}}} = 0.

When $𝒳$ and $𝒴$ are simply sets, the angle between them is given by

$\mathrm{cos}{\theta }_{𝒳,𝒴}=\frac{\mid X\cap Y\mid }{\sqrt{\mid X\mid \mid Y\mid }}.$cos\theta_{{\mathcal{X}},{\mathcal{Y}}} = \frac{|X\cap Y|}{\sqrt{|X||Y|}}.

With this notion of addition, the collection of multisets in $U$ becomes the $ℕ$-module (that is abelian monoid) ${ℕ}^{U}$; this inner product makes it an inner product space analogous to the Banach space ${ℝ}^{n}$.

## Machine Learning

The inner product of multisets is closely related to the “bag of words” kernel in machine learning (see n-Cafe).

Revised on October 28, 2009 17:39:46 by Toby Bartels (173.51.68.54)