### Introduction to hyperbolic geometry and hyperbolic embeddings

This week, I gave a tutorial on hyperbolic geometry and discussed this paper that proposes embedding symbolic data in a hyperbolic space, rather than in Euclidean space. In this post, I will try to summarize my presentation. I closely followed the book Hyberbolic Geometry by James W. Anderson in my presentation, which is an extremely well-written introductory textbook on the subject. I highly recommend this book to anyone who is interested in learning about hyperbolic geometry.

Euclid

Euclid axiomatized plane geometry more than two millennia ago. He came up with five axioms from which all of plane geometry could be derived:

1. Given two points in the plane, a unique line can be drawn passing through those points.
2. Any line segment can be extended indefinitely in either direction.
3. Given any line segment, a circle can be drawn with its center on one of the end-points of the line segment and its radius equal to the length of the line segment.
4. Any two right angles are congruent.
5. Parallel postulate: “If a straight line falling on two straight lines make the interior angles on the same side less than two right angles, the two straight lines, if produced indefinitely, meet on that side on which are the angles less than the two right angles.”

There are equivalent formulations of the parallel postulate, perhaps the most famous of which is the Playfair’s axiom: “Given a line and a point not on it, at most one line parallel to the given line can be drawn through the point.”

Generations of mathematicians after Euclid thought the parallel postulate suspiciously complicated and hoped that it could somehow be derived from the first four axioms. However, all efforts to do so failed. We now know that the parallel postulate is independent of the remaining axioms of plane geometry, meaning that the axiomatic system consisting of the first four axioms and the negation of the parallel postulate is a consistent system. In fact, even ancient Greeks were familiar with a perfectly fine model of a plane geometry that violates the parallel postulate: the surface of the sphere, i.e. $S^2$. If points are interpreted as points and lines as great circles on the surface of the sphere, the parallel postulate is violated, because any two great circles on the surface of the sphere have to intersect, hence they are not parallel. However, the problem with this model is that it also violates the second axiom: line segments cannot be extended indefinitely, since great circles are closed curves. So, it was not taken seriously as a plausible model of plane geometry.

At the beginning of the 19th century, people started to come up with model geometries that violate the parallel postulate, while satisfying all the remaining axioms of Euclid. Gauss, Bolyai and Lobachevsky were the first mathematicians to come up with such a model. However, we will be dealing mostly with two models introduced later by Poincaré. The first one is called the upper half-plane model.

The upper half-plane model, $\mathbb{H}$

The upper half-plane is the set of complex numbers with positive imaginary part: $\mathbb{H}=\{ z \in \mathbb{C} | \mathrm{Im}(z)>0\}$. The points in $\mathbb{H}$ are defined to be the usual Euclidean points, and lines are defined to be either half-lines perpendicular to the real axis, or half-circles with centers on the real axis (image source):

Circles in $\mathbb{H}$ are Eucliden circles (but not necessarily with the same center or radius). As an exercise, you can check that when points, lines and circles in $\mathbb{H}$ are interpreted in this way, all axioms of Euclid are satisfied except for the parallel postulate: show that in $\mathbb{H}$ there is an infinite number of parallel lines to a given line passing through a given point not on the line.

This model, by itself, does not come equipped with a metric. We have to define one for it. It turns out that there is a fairly natural way of assigning a metric to $\mathbb{H}$. The idea will be to first find a group of transformations in $\mathbb{H}$ that leave hyperbolic lines invariant, i.e. that map hyperbolic lines to hyperbolic lines in $\mathbb{H}$. We will then require our metric to be invariant under these transformations. It turns out that this requirement uniquely determines the metric (up to a trivial scale factor).

The group $\mathrm{Mob}(\mathbb{H})$

The upper half-plane model is thought of as embedded in the extended complex plane, $\overline{\mathbb{C}} = \mathbb{C} \cup \{\infty\}$, consisting of the complex plane and a single point at infinity.

In $\overline{\mathbb{C}}$, we define a circle to be either a Euclidean circle in $\mathbb{C}$ or a Euclidean line in $\mathbb{C}$ plus the point at infinity, $\infty$. We denote the group of homeomorphisms of $\overline{\mathbb{C}}$ taking circles in $\overline{\mathbb{C}}$ to circles in $\overline{\mathbb{C}}$ by $\mathrm{Homeo}^C(\overline{\mathbb{C}})$. Our first job is to exactly characterize this group. For this purpose, we first define the Möbius transformations.

Definition: A Möbius transformation is a function $m: \overline{\mathbb{C}} \rightarrow \overline{\mathbb{C}}$ of the following form:

$m(z) = \frac{az+b}{cz+d},$  where $a,b,c,d \in \mathbb{C}$ and $ad - bc \neq 0$.

These transformations correspond to compositions of simple translations, rotations, dilations and inversions in the complex plane. Here’s a nice video visualizing how these transformations deform a square in the complex plane:

The set of all Möbius transformations form a group denoted by $\mathrm{Mob}^+$.

Definition: The group generated by $\mathrm{Mob}^+$ and complex conjugation, $c(z)=\bar{z}$, is called the general Möbius group and denoted by $\mathrm{Mob}$.

A fundamental result then tells us that the general Möbius group is precisely the group of homeomorphisms of $\overline{\mathbb{C}}$ taking circles to circles in $\overline{\mathbb{C}}$, i.e. $\mathrm{Homeo}^C(\overline{\mathbb{C}})$:

Theorem: $\mathrm{Mob} = \mathrm{Homeo}^C(\overline{\mathbb{C}})$.

Another important property of $\mathrm{Mob}$ is that elements of $\mathrm{Mob}$ are conformal homeomorphisms, i.e. they preserve angles.

We will concentrate on a particular subgroup of $\mathrm{Mob}$ that preserves the upper half-plane:

Definition: $\mathrm{Mob}(\mathbb{H}) = \{ m \in \mathrm{Mob} | m(\mathbb{H})= \mathbb{H} \}$.

It can be shown that every element of $\mathrm{Mob}(\mathbb{H})$ takes hyperbolic lines to hyperbolic lines in $\mathbb{H}$. In this precise sense, we can think of $\mathrm{Mob}(\mathbb{H})$ as some sort of a generator for the upper half-plane model.

We can express every element of $\mathrm{Mob}(\mathbb{H})$ explicitly either as:

$m(z) = \frac{az+b}{cz+d},$ where $a, b, c, d \in \mathbb{R}$ and $ad-bc=1$

or as:

$m(z) = \frac{a\bar{z}+b}{c\bar{z}+d},$ where $a, b, c, d$ are purely imaginary and, again, $ad-bc=1$.

Length and distance in $\mathbb{H}$

We are now ready to define a line element and a distance metric in $\mathbb{H}$. As mentioned above, we will do this by requiring that whatever line element and distance metric we choose for $\mathbb{H}$ should be invariant under the action of the group $\mathrm{Mob}(\mathbb{H})$. It turns out that this requirement uniquely determines the line element and the distance metric (up to a trivial scalar factor).

Let’s first review a few basic definitions. Let $f: [a,b] \rightarrow \mathbb{R}^2$ define a path in the Euclidean plane. The Euclidean length of $f$ is defined as:

$length(f) \equiv \int_{a}^b \sqrt{x^\prime(t)^2 + y^\prime(t)^2} dt$

or if the Euclidean plane is considered as the complex plane, this can be equivalently expressed as:

$length(f) = \int_{a}^b |f^\prime(t)| dt \equiv \int_f |dz|$

where $|dz|$ is called the Euclidean line element. A more general line element can be defined by scaling the Euclidean line element by a scale factor, $\rho(z)$, that could depend on $z$: i.e. $\rho(z)|dz|$. We define the length of $f$ with respect to this new line element as:

$length_{\rho}(f) \equiv \int_f \rho(z)|dz| = \int_{a}^b \rho(f(t))|f^\prime(t)| dt$

As mentioned above, our strategy will be to determine a $\rho(z)$ for $\mathbb{H}$ by requiring the lengths of all paths in $\mathbb{H}$ to be preserved under the action of the group $\mathrm{Mob}(\mathbb{H})$, i.e. $length_{\rho}(f)=length_{\rho}(\gamma \circ f)$ for all $\gamma \in \mathrm{Mob}(\mathbb{H})$ and all $f$. This requirement results in (up to a constant multiplicative factor):

$\rho(z) = \frac{1}{\mathrm{Im}(z)}$

Therefore, the hyperbolic length of a path in $\mathbb{H}$ can be measured as:

$length_{\mathbb{H}}(f) = \int_f \frac{1}{\mathrm{Im}(z)} |dz| = \int_a^b \frac{1}{\mathrm{Im}(f(t))} |f^\prime(t)|dt$

Example: Consider $f(t)=it$ defined in the interval $[a,b]$. Then, $length_\mathrm{H}(f)=\int_a^b \frac{1}{t}dt = \log(\frac{b}{a})$. We see that  the length blows up as $a\rightarrow 0$.

Now, we can define a distance metric in $\mathbb{H}$ based on our definition of $length_{\mathrm{H}}.$ The idea is to find a $\gamma \in \mathrm{Mob}(\mathbb{H})$ that maps the hyperbolic line passing through $z_1$ and $z_2$ to the imaginary axis so that $\gamma(z_1)=\mu i$ and $\gamma(z_2)=\lambda i$ for some $\mu$ and $\lambda$ (it is easy to show that we can always find such a transformation and that, if there are multiple such transformations, the metric defined below does not depend on which one is chosen). From the example above, we know how to measure the lengths of line segments on the imaginary axis.

When we do this, we find the following formula for measuring the hyperbolic distance between two arbitrary points $z_1, z_2 \in \mathbb{H}$:

$d_{\mathbb{H}}(z_1, z_2) = | \log | \frac{y_2(x_1-c-r)}{y_1(x_2-c-r)} | |$

where $(x_1,y_1)$ and $(x_2,y_2)$ are the coordinates of $z_1$ and $z_2$, respectively, and $c$ and $r$ the center and the radius of the hyperbolic line passing through $z_1$ and $z_2$.

One can show that the group $\mathrm{Mob}(\mathbb{H})$ corresponds exactly to the group of isometries of $\mathbb{H}$: i.e. $\mathrm{Isom}(\mathbb{H}, d_{\mathbb{H}}) = \mathrm{Mob}(\mathbb{H})$.

One can also show that there are two qualitatively distinct types of parallel lines in $\mathbb{H}$: parallel lines that asymptote to a common point at infinity and parallel lines that do not asymptote to a common point at infinity (called ultraparallel lines). For the former ones, it can be shown that $d_{\mathbb{H}}(l_0, l_1)=0$, whereas for the latter case, $d_{\mathbb{H}}(l_0, l_1)>0$. Therefore, ultraparallel lines behave more like parallel lines in Euclidean space.

Poincaré disk model, $\mathbb{D}$

Another model of hyperbolic geometry is the Poincaré disk model. This model is defined inside the unit disk in $\mathbb{C}$: $\mathbb{D}=\{ z \in \mathbb{C}| |z|<1 \}.$ A hyperbolic line in $\mathbb{D}$ is the image under $m^{-1}$ of a hyperbolic line in $\mathbb{H}$, where $m: \mathbb{D} \rightarrow \mathbb{H}$ is an element of $\mathrm{Mob}.$ This implies that hyperbolic lines in $\mathbb{D}$ are Euclidean arcs perpendicular to the unit circle $S^1$ (image source):

Now, we can do the same things with this model that we did with the upper half-plane model, namely find a scaling function $\rho(z)$ that makes the lengths of paths invariant under the group $\mathrm{Mob}(\mathbb{D})$, then define lengths and distances based on this $\rho(z)$. When we do that, we get the following expressions:

$\rho(z)=\frac{2}{1-|z|^2}$

$length_{\mathbb{D}}(f) = \int_f \frac{2}{1-|z|^2}|dz|$

Example: Consider $f: [0,r] \rightarrow \mathbb{D}$, $f(t)=t$. Then, $length(f)=\int_0^r \frac{2}{1-t^2}dt=\log(\frac{1+r}{1-r})$. Again, we see that the length blows up as $r \rightarrow 1$.

Example: It can be shown that the length of a hyperbolic circle in $\mathbb{D}$ with hyperbolic center 0 and hyperbolic radius $s$ is $2\pi \sinh(s)$.

This line element leads to the following distance metric in $\mathbb{D}$, $d_{\mathbb{D}}: \mathbb{D} \times \mathbb{D} \rightarrow \mathbb{R}$:

$d_{\mathbb{D}}(z_1,z_2) = \mathrm{arccosh} ( \frac{2|z_1-z_2|^2}{(1-|z_1|^2)(1-|z_2|^2)}+1 )$

This same formula holds in higher dimensional versions of the Poincaré disk model as well.

Hyperbolic Area

Similarly to the length of a path in $\mathbb{H}$, we define the area of a set $X \subset \mathbb{H}$ by:

$area_{\mathbb{H}}(X) \equiv \int_{X} \frac{1}{\mathrm{Im}(z)^2}dxdy$

Just like length and distance, $area_{\mathbb{H}}$ is also invariant under the action of the group $\mathrm{Mob}(\mathbb{H})$.

In the Poincaré disk model, area is defined as (using Cartesian coordinates):

$area_{\mathbb{D}}(X) = \int_X \frac{4}{(1-|z|^2)^2}dxdy = \int_X \frac{4}{(1-x^2 -y^2)^2}dxdy$

It can be shown that the area of a hyperbolic disk in $\mathbb{D}$ with center 0 and radius $s$ is $4\pi \sinh^2(\frac{s}{2})$. This implies that for a disk with center 0 and radius $s$, the ratio $\frac{length}{area} \rightarrow 1$ as $s \rightarrow \infty$. This is very unlike the Euclidean case, where $\frac{length}{area}\rightarrow 0$ as $s \rightarrow \infty$. Intuitively, lengths become as big as areas for sufficiently large shapes in hyperbolic space (similar results hold for higher dimensional hyperbolic spaces).

Gauss-Bonnet

This is a truly amazing result that says that for a hyperbolic triangle $P$ with interior angles $\alpha, \beta, \gamma$, the area is given by $area_{\mathbb{H}}(P)=\pi - (\alpha+\beta+\gamma)$.

In fact, this result generalizes to hyperbolic polygons: if $P$ is a hyperbolic polygon with interior angles $\alpha_1, \ldots, \alpha_n$, then the area of $P$ is given by:

$area_{\mathbb{H}}(P) = (n-2)\pi - \sum_{k=1}^n \alpha_k$

Another big difference from the Euclidean case is that in the Euclidean plane, there is only one regular $n$-gon (up to translation, rotation and dilation) and its interior angle is $\frac{n-2}{n}\pi$. The situation is completely different in the hyperbolic plane: for all $n\geq 3$ and each $\alpha \in (0,\frac{n-2}{n}\pi)$, there is a compact regular hyperbolic $n$-gon whose interior angle is $\alpha$.

Higher-dimensional hyperbolic spaces

The two-dimensional models already make apparent many strange features of hyperbolic spaces. Things can get even stranger in higher dimensional hyperbolic spaces. However, instead of studying these spaces formally, we can try to get a feel for the strange properties of higher dimensional hyperbolic spaces by exploring the three-dimensional hyperbolic space, $\mathbb{H}^3$. Here‘s a software tool for exploring $\mathbb{H}^3$. You can find some background knowledge and information about how to use this tool here. The tool here models the {4,3,6}-tessellation of $\mathbb{H}^3$ (the notation here means the shape of the cells tiling the space is a cube, whose Schläfli symbol is {4,3}, and there are six of these cells around each edge). This is not the only possible tessellation of $\mathbb{H}^3$; in fact, it is not even the only possible cubic tessellation of $\mathbb{H}^3$ (this is again a drastic difference from the Euclidean 3-space where there is essentially only one cubic tessellation of the space). There’s, in fact, an infinite number of such tessellations (a fact closely related to the result mentioned above that says that there are regular hyperbolic $n$-gons with any interior angle $\alpha$ as long as $\alpha \in (0,\frac{n-2}{n}\pi)$). I strongly encourage you to explore these different tessellations of $\mathbb{H}^3$ on Wikipedia, which has amazing visualizations of these tessellations.

Hyperbolic embeddings

Finally, I would like to briefly discuss this paper by Nickel and Kiela that proposes embedding data (especially symbolic or graph-structured data) in hyperbolic spaces (see this previous post of mine for a brief review of why one might want to use embeddings in continuous spaces to represent symbolic data). The idea is really simple: we first define an objective that encourages semantically or functionally related items to have closer representations (and conversely semantically or functionally unrelated items to have more distant representations) in the embedding space, where the embedding space is a hyperbolic space (for example, the Poincaré ball model, i.e. the higher dimensional version of the Poincaré disk model reviewed above) and distance is measured by hyperbolic distance. For instance, one of the objectives they use in the paper is the following loss function:

$\mathcal{L}(\Theta) = \sum_{(u,v) \in \mathcal{D}} \log \frac{\exp(-d_{\mathbb{D}}(u,v))}{\sum_{v^\prime \in \mathcal{N}(u)} \exp(-d_{\mathbb{D}}(u,v^\prime))}$

where $\mathcal{D}$ denotes the set of semantically or functionally related items that are connected by an edge in the dataset, $\mathcal{N}(u)$ denotes the set of items unrelated to $u$, and $d_{\mathbb{D}}$ denotes the hyperbolic distance between the embeddings of the items in the Poincaré ball model.

We then just do gradient descent on this loss function to optimize the positions of the items in the embedding space. There are two subtleties to take care of while doing gradient descent: first, because we are working with the Poincaré ball model, we have to make sure that the positions stay within the unit ball during optimization. This can be achieved by applying a projection operator after each update:

where $\epsilon > 0$ ensures numerical stability.

Secondly, because we are working in a curved space, taking its intrinsic curvature into account during optimization would be more efficient. This means that we have to use the Riemannian gradient rather than the Euclidean gradient for gradient descent. These two gradients are related to each other by the inverse of the metric, which is itself given by the square of the line element scaling factor $\rho(z)$ described above. This then leads to the following update rule:

Here, $\eta_t$ is the learning rate, $\nabla_E$ is the Euclidean gradient and the term $(1-||\theta_t||^2)^2/4$ is just the inverse of the metric, which in turn is just the square of the scaling factor $\rho(z) = 2/(1-|z|^2)$ derived above for the Poincaré disk model.

The authors show that hyperbolic embeddings lead to better performance in reconstruction and link prediction tasks, usually with a few orders of magnitude smaller number of embedding dimensions than a Euclidean embedding! That’s an impressive improvement. Intuitively, hyperbolic spaces of a given dimension contain a lot “more space” than the corresponding Euclidean space, so we need a lot fewer dimensions to accurately capture the semantic or functional relationships between the items. That said, I have a few concerns about hyperbolic embeddings:

• Although it may be true that hyperbolic spaces are better suited than Euclidean spaces to capture the semantic or functional relationships in tree-structured data, there is, in general, no reason to assume a priori that the underlying space has to be exactly hyperbolic. I think a better approach would be to learn the appropriate metric, rather than assuming it. One can still initialize it close to a hyperbolic metric, but making it flexible would allow us to capture possible deviations from hyperbolicity.
• In the paper, they use relatively simple tasks to demonstrate the advantages of hyperbolic embeddings. In practice, one may want to use these embeddings as inputs to some deep and large model being trained on a more challenging task. One may also want to learn such hyperbolic embeddings jointly with the parameters of the deep model in an end-to-end fashion. I’m not quite sure how well hyperbolic embeddings would behave in such a context. The fact that distances blow up very rapidly in hyperbolic spaces as one approaches the boundary suggests that such end-to-end training of a deep and large model on hyperbolic embeddings may be riddled with serious optimization difficulties. Again, making the metric flexible could force the model to find and use a more “learnable” metric.