Logic in vector spaces

Symbolic and distributional representations in AI have complementary virtues. Symbolic representations are useful for reasoning, are easier to interpret and enable abstraction; whereas distributional representations can deal with noise and ambiguity, are easier to implement in neural networks and hence enable learning. So, how can we implement both representations in a common format so that we can make use of the desirable properties of both types of representation?

Grefenstette (2013) shows a simple way of mapping predicate logic (plus some rudimentary quantification) to tensor calculus. In this mapping, the truth values are assigned specific vectors: \top=[1 \; 0]^\top and \bot=[0 \; 1]^\top; domain objects are represented by one-hot vectors; unary predicates are assigned rank 2 tensors (i.e. matrices) that lead to the correct truth value when contracted with the vectors corresponding to the domain objects. Similarly, n-ary relations are represented by rank n+1 tensors that lead to the correct truth values when contracted with n-tuples of objects in the domain.

To give an example from the paper, suppose there are two objects in the domain represented by \mathbf{d}_1 and \mathbf{d}_2. Everybody loves everybody in this domain, except \mathbf{d}_2 doesn’t love \mathbf{d}_1. Then, the binary “loves” relation is represented by the rank 3 tensor:

\mathbf{T}^{loves} = \top \otimes \mathbf{d}_1 \otimes \mathbf{d}_1 + \top \otimes \mathbf{d}_1 \otimes \mathbf{d}_2 + \top \otimes \mathbf{d}_2 \otimes \mathbf{d}_2 + \bot \otimes \mathbf{d}_2 \otimes \mathbf{d}_1

Negation is modeled as the swap matrix:

\mathbf{T}^\neg = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}

and, because they are binary, the other logical operators can be modeled as rank 3 tensor. For simple (non-embedded) quantification, one first turns predicates into vectors (with domain objects satisfying the predicate having a 1 and the other objects having a 0 in the vector) and checks if \mathbf{X} = min(\mathbf{X},\mathbf{Y}) for “All Xs are Ys” and checks if |\mathbf{X}|>0 for “There exists X”, with \mathbf{X} and \mathbf{Y} representing the vectorized representations of the predicates X and Y, respectively.

One can improve upon the representations of domain objects and predicates in this scheme by learning low-dimensional embeddings for them. This approach is proposed in Rocktaeschel et al. (2014). The idea here is to minimize an objective like the following:

\displaystyle \min_{[d], [P]\; \forall d \in \mathcal{D}, \; \forall P \in \mathcal{P}} \sum_{\mathcal{F} \in K} || [\mathcal{F}]-\top ||_2

where [d] and [P] are low-dimensional embeddings of the domain objects and predicates and \mathcal{F} ranges over the facts in a given knowledge base K that we want to capture. One can easily minimize this objective by stochastic gradient descent.

The advantage of such low-dimensional embeddings is that they enable generalization of inferences made about domain objects or predicates to similar objects or predicates. They are also more efficient than the potentially very high-dimensional one-hot type representations.