categorical compositional distributional semantics


Categorical compositional distributional semantics, also known as DisCoCat for short, uses category theory to combine the benefits of two very different approaches to linguistics: categorial grammar and distributional semantics. Specifically, it uses the fact that pregroups and the category of finite-dimensional real vector spaces are both examples of compact closed categories. It turns out that word meanings can be arranged into a strong monoidal functor, which allows sentence meanings to be computed from word meanings by linear algebra, compositionally on the grammatical structure of the sentence.

DisCoCat was first introduced in Coecke, Sadrzadeh and Clark 2010.

There is a curious (if probably shallow) analogy between DisCoCat and topological quantum field theory, in that both concern strong monoidal functors from a free compact closed category to a category of vector spaces.

Simple example

Consider a very simple fragment of English consisting of nouns and verbs. Write nn for the type of nouns, and ss for the type of sentences. Let 𝒞\mathcal{C} be the free pregroup generated by nn and ss. In 𝒞\mathcal{C}, a noun will be assigned the type nn and a reflexive verb will be assigned the type n rsn ln^{r} s n^{l}, since it will produce a sentence if provided with nouns on the left and right. Now given nouns such as “Alice” and “Bob” and reflexive verbs such as “loves” and “hates”, the grammaticality of a sentence such as “Alice hates Bob” is witnessed by the fact that

nn rsn lns n n^{r} s n^{l} n \leq s

in the free pregroup.

A strong monoidal functor F:𝒞FVect F : \mathcal{C} \to FVect_{\mathbb{R}} is completely determined by where it sends the generators nn and ss. The vector spaces F(n)F (n) and F(s)F (s) are called the noun space and the sentence space, and their choice is a matter of linguistic modelling. A common approach is to pick a large set nn of basis words, say n=1000n = 1000 (or as many as your sparse matrix implementation can handle), and take F(n)= 1000F (n) = \mathbb{R}^{1000}. The choice of F(s)F (s) is not well understood, but F(s)= 2F (s) = \mathbb{R}^{2} is a common choice, with the standard basis vectors interpreted as “true” and “false”.

Given a grammatical parsing such as nn rsn lnsn n^{r} s n^{l} n \leq s, we automatically get a linear map

F(nn rsn ln):F(n)F(n)F(s)F(n)F(n)F(s) F (n n^{r} s n^{l} n) : F (n) \otimes F (n) \otimes F (s) \otimes F (n) \otimes F (n) \to F (s)

given by appropriate tensor contractions.

The final ingredient we need are word vectors. We need to pick vectors [[Alice]],[[Bob]]F(n)[\![ Alice ]\!], [\![ Bob ]\!] \in F (n) and [[loves]]F(n)F(s)F(n)[\![ loves ]\!] \in F (n) \otimes F (s) \otimes F (n). The choice of these vectors is also a matter for linguistics, but a common idea is to count how often Alice (for example) appears in a corpus (for example the full text of Wikipedia) close to each of the chosen basis words.

Finally, we obtain a sentence-vector

F(nn rsn lns)([[Alice]][[loves]][[Bob]])F(s) F (n n^{r} s n^{l} n \leq s) ([\![ Alice ]\!] \otimes [\![ loves ]\!] \otimes [\![ Bob ]\!]) \in F (s)

Relative pronouns and Frobenius algebras

Free pregroups vs free autonomous categories

All earlier papers on DisCoCat use free pregroups for the grammar category, building on earlier work by Joachim Lambek on pregroup grammars. Unfortunately, in Preller 2014, fact 1 it is proven that any strong monoidal functor from a free pregroup to FVectFVect is necessarily trivial, in the sense that it sends every generator to the monoidal unit \mathbb{R}. This is worked around by using a free autonomous category instead of a free pregroup (this is a sort of categorification). Morphisms in the free autonomous category can be viewed as proof relevant grammatical reductions.

This allows a slightly more elegant reformulation of our basic example. Let 𝒞\mathcal{C} be the free autonomous category on the objects n,sn, s and the morphisms Alice:1nAlice : 1 \to n, Bob:1nBob : 1 \to n and loves:1n rsn lloves : 1 \to n^{r} s n^{l}. Then a sentence such as “Alice loves Bob”, together with its grammaticality, is witnessed by a morphism f:1sf : 1 \to s. Now the word-vectors [[Alice]]=F(Alice)[\![ Alice ]\!] = F (Alice), [[Bob]]=F(Bob)[\![ Bob ]\!] = F (Bob) and [[loves]]=F(loves)[\![ loves ]\!] = F (loves) become part of the data defining F:𝒞FVectF : \mathcal{C} \to FVect, and the resulting sentence vector is simply F(f)F (f).


DisCoCat is relatively easy to modify by replacing the semantic category FVectFVect with some other category with the appropriate structure. Examples include


Last revised on July 10, 2019 at 10:43:18. See the history of this page for a list of all contributions to it.