### Logic in vector spaces

Symbolic and distributional representations in AI have complementary virtues. Symbolic representations are useful for reasoning, are easier to interpret and enable abstraction; whereas distributional representations can deal with noise and ambiguity, are easier to implement in neural networks and hence enable learning. So, how can we implement both representations in a common format so that we can make use of the desirable properties of both types of representation?

Grefenstette (2013) shows a simple way of mapping predicate logic (plus some rudimentary quantification) to tensor calculus. In this mapping, the truth values are assigned specific vectors: $\top=[1 \; 0]^\top$ and $\bot=[0 \; 1]^\top$; domain objects are represented by one-hot vectors; unary predicates are assigned rank 2 tensors (i.e. matrices) that lead to the correct truth value when contracted with the vectors corresponding to the domain objects. Similarly, $n$-ary relations are represented by rank $n+1$ tensors that lead to the correct truth values when contracted with $n$-tuples of objects in the domain.

To give an example from the paper, suppose there are two objects in the domain represented by $\mathbf{d}_1$ and $\mathbf{d}_2$. Everybody loves everybody in this domain, except $\mathbf{d}_2$ doesn’t love $\mathbf{d}_1$. Then, the binary “loves” relation is represented by the rank 3 tensor:

$\mathbf{T}^{loves} = \top \otimes \mathbf{d}_1 \otimes \mathbf{d}_1 + \top \otimes \mathbf{d}_1 \otimes \mathbf{d}_2 + \top \otimes \mathbf{d}_2 \otimes \mathbf{d}_2 + \bot \otimes \mathbf{d}_2 \otimes \mathbf{d}_1$

Negation is modeled as the swap matrix:

$\mathbf{T}^\neg = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}$

and, because they are binary, the other logical operators can be modeled as rank 3 tensor. For simple (non-embedded) quantification, one first turns predicates into vectors (with domain objects satisfying the predicate having a 1 and the other objects having a 0 in the vector) and checks if $\mathbf{X} = min(\mathbf{X},\mathbf{Y})$ for “All Xs are Ys” and checks if $|\mathbf{X}|>0$ for “There exists X”, with $\mathbf{X}$ and $\mathbf{Y}$ representing the vectorized representations of the predicates X and Y, respectively.

One can improve upon the representations of domain objects and predicates in this scheme by learning low-dimensional embeddings for them. This approach is proposed in Rocktaeschel et al. (2014). The idea here is to minimize an objective like the following:

$\displaystyle \min_{[d], [P]\; \forall d \in \mathcal{D}, \; \forall P \in \mathcal{P}} \sum_{\mathcal{F} \in K} || [\mathcal{F}]-\top ||_2$

where $[d]$ and $[P]$ are low-dimensional embeddings of the domain objects and predicates and $\mathcal{F}$ ranges over the facts in a given knowledge base $K$ that we want to capture. One can easily minimize this objective by stochastic gradient descent.

The advantage of such low-dimensional embeddings is that they enable generalization of inferences made about domain objects or predicates to similar objects or predicates. They are also more efficient than the potentially very high-dimensional one-hot type representations.