MathJax TeX Test Page

Persistent homology is a really foundational tool that found uses in many applicative domains. Even though the way we presented it was focusing on a simplicial complex and a filtering function defined on its vertices, the reader should be aware of the plethora of suitable input data that can be considered as a filtration. In the following, we describe the state of the art for what concern persistent homology. First, we will describe a non-exhaustive list of different datasets that can be interpreted as a filtration. Then, we will provide a list of the alternative algorithms that can be used to enhance the performances of the standard one. Also, we provide links to the publicly available implementations.

Applications and datasets

The idea of filtration is a generalization of many types of data that we find in real applications. In particular, we can think about filtrations as:

scalar fields defined on the vertices of a discrete object,
point clouds,
arbitrary filtrations.

Figure 3 - A simulation of the Hurrican Isable.

Scalar fields. When dealing with scalar fields, a function \(f\) is defined on the vertices of an input complex \(\Sigma\). In such cases, the filtering function on \(\Sigma\) is obtained by assigning to each simplex of \(\Sigma\) the maximum among the values of its vertices. Formally, given a simplex \(\sigma\) of \(\Sigma\), the function value \(f(\sigma)\) is usually defined as \(max \{f(v) \,|\, v \text{ is a vertex of } \sigma\}\).

In practical applications, scalar fields are extremely common. The simplest examples of scalar fields are arguably the terrain datasets. A terrain dataset can be considered as a planar triangulation where each point is defined by two coordinates. The scalar function defined for each of theme is also called elevation in this case. Volumetric images are other examples of scalar fields. These types of datasets can be described as point clouds immersed in the 3D space. The regular distribution of the points naturally infers a topology on them. Volumetric images are common in medicine (3D TAC or MRI machines produces this kind of data) or scientific simulations used in engineer or forecast analysis.

Useful links to datasets:

Aim@Shape [Terrain and Volume data]
Virtual Terrain Project [Terrain data]
The Volume Library [Volume data]
VIS UCDavis [Volume data] (using one time-step at a time)
Hurricane Isabel WRF Model Data [Volume data] (using one time-step at a time)

Figure 5 - Filtration based on a distance function computed between pairs of points.

Point clouds. Point cloud data are extremely common in the applications. Defined as set of points embedded in some space (possibly high dimensional), the analysis of point clouds is one of the most classical examples where persistent homology has been applied.

When working with point clouds, the simplicial complex is built directly on the input data by defining a "distance" function. One of the most famous approaches for defining such a complex is the Vietoris-Rips.

The Vietoris-Rips complex at scale \(\epsilon\) is an abstract simplicial complex \(\Sigma\) obtained defining a distance function \(d(v_1,v_2)\) such that each simplex of \(\Sigma\) has vertices that are pairwise within distance \(d(v_1,v_2) < \epsilon\). In real applications, given the input point cloud, a sequence of growing simplicial complexes is built by incrementing the parameter \(\epsilon\). This creates a filtration which homology changes can be studied using persistent homology. Actually, the techniques used for building such a sequence are quite different, but all of them are based on an increasing distance value \(d\). A connection between two distinct points is established once their distance is less than the threshold value \(d\). Classical constructions that have to be mentioned are:

\(\alpha\)-shapes, a first attempt of representing a set of points through a generalization of their convex hull;
Čech complexes, the most classical procedure adopted in theory;
Vietoris-Rips complexes, a practically-feasible approximation of the notion Čech complex;
Witness complexes, a further and more practical approximation of the notion Čech complex.

All these methods implicitly produce a filtered simplicial complex. As before, the filtration value of a simplex is usually defined as the maximum value assumed by its vertices.

Useful links to datasets:

Arbitrary filtrations. A third interesting class of filtrations is the one of the arbitrary filtrations. With this term, we want to denote filtrations in which the filtration value assumed by a simplex is not determined by the values on its vertices. This class of filtrations is usually related to a different and more theoretical point of view about persistent homology. Instead of considering persistent homology as a tool for analyzing the topological features of a shape represented by a sequence of increasing simplicial complexes, one can interpret persistent homology as a tool for describing a single map between two different simplicial complexes. From this point of view, persistent homology allows studying the homological behavior of simplicial maps. In this context, the simplical map under investigation is decomposed in terms of elementary and atomic modifications which give origin to a (non necessarily increasing) sequence of simplicial complexes.

Other algorithmical approaches for computing persistent homology

In our description, we have focused our attention on the first algorithm defined in the literature for computing persistent homology. We have denoted it as the standard algorithm. Its time complexity is potentially super-cubical in the number of the simplices of the simplicial complex \(\Sigma\) but, in practical cases, can be considered linear. Despite this, in the literature several approaches improving the described algorithm have been developed to compute homology and persistent homology.
Based on the strategy they adopt, we can subdivide the methods introduced in the literature to retrieve homological information into the following classes:

direct optimizations,
distributed approaches,
methods based on annotations.

1. We refer as direct optimizations all those methods that improve the efficiency of the standard algorithm or that introduce new techniques, based on a matrix reduction, to compute the homological information. First of all, we have to mention that a generalization of the standard algorithm here presented has been proposed for computing persistent homology of a filtered simplicial complex with respect to any PID coefficient rings. A specific generalization has been defined for retrieving persistent homology with coefficients in a field. In this case, the computation is based on the detection of the structure of a graded module over a polynomial ring and it has revealed to be particularly effective. Several sequential optimizations of this algorithm have been proposed: by defining the notion of zigzag persistent homology, by using an output-sensitive algorithm, by improving the running time thanks to a suitable change in the order of column reduction or by investigating relations between persistent homology and cohomology thanks to the so-called dual algorithm.

2. Distributed approaches efficiently retrieve the homological information of a complex through parallel and distributed computations. Some of such approaches are based on a decomposition of the simplicial complex, while others act directly on the boundary matrices. In the literature, several methods have adopted such a distributed strategy: by using spectral sequences, by parallelizing a coreduction-based algorithm, by decomposing the boundary matrix in chunks or by exploiting a distributed memory environment.

3. Recently, a new strategy of computation is revealing very efficient. In this case, persistent homology is efficiently retrieved by the use of annotations which are vectors associated with each simplex of the complex encoding in a compact way the homology class of the simplex itself.

A different and fundamental subfield of methods studied for seeking of efficiency while computing persistent homology are the so-called, coarsening or pruning approaches. These methods reduce the size of the input complex without changing its homological information by applying iterative simplifications, and by computing the persistent homology when no more simplification is possible. Some of these approaches are based on reductions and coreductions, others simplify the simplicial complex via acyclic subspaces or by using edge contractions. Similar approaches are based on the notion of tidy set, a minimal simplicial set obtained trimming and thinning the original simplicial complex, or on the construction of the discrete Morse complex, a compact model homologically-equivalent to the input complex.

Software tools for computing persistent homology

Here, we are collecting the software tools for computing persistence homology that have been distributed in the public domain.

Tools which directly work with the boundary matrices of a simplicial complex:

javaPlex - standard algorithm;
GAP persistence - dual algorithm;
Dionysus - by zigzag persistence optimization;
PHAT - collection of multimple optimized algorithms;
DIPHA - persistent homology groups computed based on a distributed computation;
RIPSER - time and memory efficient algorithm computing also the Vietoris-Rips complex;
PHOM - standard and dual algorithm (implemented in R).

Tools based on the annotation vectors and on the Simplex Tree data structure:

Gudhi;
SimPers.

Tools for reducing the size of the input complex without altering its topological features:

RedHom - based on reduction and coreduction operators;
Perseus - based on discrete Morse theory.