topological data analysis

The area of Topological Data Analysis (TDA) has emerged recently as being that part of Computational Topology concerned with applying the methods of that subject to the analysis of data sets that are often of very large size; the methods used are adapted from algebraic and differential topology and are closely related to those used for spatial reconstruction from scanned data in Visualisation, but the context is, theoretically, not limited to low dimensions nor to data of spatial origin nor, initially, to the visualisation of the data. Its aim, rather, is to give *qualitative* information on the data, allowing for statistical variation, noise etc.

The sort of applications considered so far have been looking, for instance, for qualitative structure in the clusters obtained by some classifier. These methods usually assume the data comes from sampling a manifold or simplicial complex. To these methods, we would suggest the addition of a new type of analysis related to the verification of a mathematical model for what might be a non-linear situation involving feedback and perhaps even some chaotic aspects. The idealised mathematical model would then be likely to predict that the experimental data might be as if sampled from a (possibly high dimensional) space that is embedded in some (even higher dimensional) Euclidean space, but this idealised model space need not form a manifold, and might even be fractal in its nature. This therefore suggests other questions to ask of the data. Is it reasonable, given the data, to assume that it is sampled from a manifold or poyhedron? If not can we analyse the space at all?

This raises a deep methodological question.

**What is the space we are ‘looking at’ and how is it presented to us?**

In my papers with Jonathan Gratus, it has been argued that there are two related views of Spatial Representation. The first is representation *of* spaces, but the second, and for the purposes here the more basic one, is representation *by* spaces. In both cases,

*the space is an idealised object obtained as the limiting case of indefinitely refined observations of the context, object or data.*

A theoretical and, hence, mathematical model, if possible, will give a second idealised object, another ‘space’. As this second idealised space is typically observable only through finite *computational* approximations, it also is obtained as a limit. One eventual aim of TDA in this situation could be to provide a comparison between these two ‘spaces’. In other words, in this analysis, any space gives an idealisation of a context and whether or not that is a ‘spatial context’, or even what ‘spatial context’ means, might be the subject of a lot of philosophical debate and we will not explore it more here.

The classical methods of algebraic topology had quite a lot to say about the ‘invariants’ of such limiting spaces, and we will look at some of these methods later. Those classical methods, by themselves, are not algorithmically efficient, and sometimes not even feasible, but they can be adapted to give much more computationally ‘friendly’ tools. We will briefly review the history of these tools to show how, with a limiting space, the information on the (finite) approximations relates to that on the limit. Here some examples show relevant behaviour for our ‘non-linear model’ thought experiment. The critical phenomena in such examples relates closely to the analysis of not so much the approximations to the limit, but to the refinement or comparison maps between them. This means that we must examine whether the available tools (in particular the various forms of homology) of the present form of TDA can be pushed beyond their current analysis of objects to help in the analysis of maps, and, in the limit, to give qualitative information on the idealised limiting space.

Revised on April 17, 2012 09:16:48
by Tim Porter
(131.251.253.155)