Logic in vector spaces
Symbolic and distributional representations in AI have complementary virtues. Symbolic representations are useful for reasoning, are easier to interpret and enable abstraction; whereas distributional representations can deal with noise and ambiguity, are easier to implement in neural networks and hence enable learning. So, how can we implement both representations in a common format so that we can make use of the desirable properties of both types of representation?
Grefenstette (2013) shows a simple way of mapping predicate logic (plus some rudimentary quantification) to tensor calculus. In this mapping, the truth values are assigned specific vectors: and ; domain objects are represented by one-hot vectors; unary predicates are assigned rank 2 tensors (i.e. matrices) that lead to the correct truth value when contracted with the vectors corresponding to the domain objects. Similarly, -ary relations are represented by rank tensors that lead to the correct truth values when contracted with -tuples of objects in the domain.
To give an example from the paper, suppose there are two objects in the domain represented by and . Everybody loves everybody in this domain, except doesn’t love . Then, the binary “loves” relation is represented by the rank 3 tensor:
Negation is modeled as the swap matrix:
and, because they are binary, the other logical operators can be modeled as rank 3 tensor. For simple (non-embedded) quantification, one first turns predicates into vectors (with domain objects satisfying the predicate having a 1 and the other objects having a 0 in the vector) and checks if for “All Xs are Ys” and checks if for “There exists X”, with and representing the vectorized representations of the predicates X and Y, respectively.
One can improve upon the representations of domain objects and predicates in this scheme by learning low-dimensional embeddings for them. This approach is proposed in Rocktaeschel et al. (2014). The idea here is to minimize an objective like the following:
where and are low-dimensional embeddings of the domain objects and predicates and ranges over the facts in a given knowledge base that we want to capture. One can easily minimize this objective by stochastic gradient descent.
The advantage of such low-dimensional embeddings is that they enable generalization of inferences made about domain objects or predicates to similar objects or predicates. They are also more efficient than the potentially very high-dimensional one-hot type representations.