The PhD student will be a member of the Geometrica team in Saclay.
The PhD work and results will contribute to:
- EU project CG-Learning (http://cglearning.eu/),
- associated team COMET between Geometrica, Geometric computing group at Stanford and T. Dey's group at Ohio Univ (http://www.inria.fr/en/teams/comet).
- a submitted ANR project entitled ``TopData: Topological Data Analysis: Statistical Methods and Inference''
Abstract
During the last decade, the wide availability of measurement devices and simulation tools has led to an explosion in the amount of available data in almost all domains of Science, industry, economy and even everyday life. Often these data come as point clouds embedded in Euclidean spaces, or as more general metric spaces that carry topological and geometric structures. Geometric inference and Topological Data Analysis (TDA) are recent fields, knowing an increasing interest, that aim at inferring and exploiting these structures for a better understanding, analysis and processing of these data.
The goal of this PhD is to develop new well founded statistical and learning methods and algorithms for TDA.
Context
During the last decade, the wide availability of measurement devices and simulation tools has led to an explosion in the amount of available data in almost all domains of Science, industry, economy and even everyday life. Often these data come as point clouds embedded in Euclidean spaces, or as more general metric spaces which is often the case for sensor networks or social networks data. These points usually carry some geometric structure (manifold or more general stratified space) which reflects important properties of the "systems" from which the data have been generated.
With the recent explosion in the amount and variety of available data, identifying, extracting and exploiting their underlying geometric structures has become a problem of fundamental importance for data analysis and statistical learning.
Objectives
With the emergence of distance based approaches and persistent topology, geometric inference and computational topology have recently known an important development. New mathematically well-founded theories gave birth to the field of Topological Data Analysis .
So far the obtained results rely mostly on deterministic assumptions which are not satisfactory from a statistical viewpoint. As a consequence the corresponding methods remain exploratory. they do not benefit from a sound probabilistic. framework and connot be easily used in a learning framework. Despite a few notable attempts to overcome this issue, the development of a statistical approach to Topological Data Analysis is still in its infancy. The objective of this PhD is to combine computational topology and geometry, statistical and learning approaches to go beyond and develop and implement well founded methods and algorithms for Topological Data Analysis (TDA).
Work program
The work program focuses on the design of various statistical frameworks and models for topological persistence. It will be organized in two main parts:
1. Statistical properties of persistence diagrams: the objective of this part is to provide frameworks and results in which persistent diagrams can be used as well founded statistics and to design new TDA and geometric inference algorithms taking advantage of these statistics.
2. Kernel-based learning algorithms for TDA: the objective of this part is to provide a set of statistical and machine learning algorithms that explicitly exploit the topological structure of data encoded in persistence diagrams. The relevance and efficiency of the designed tools will be tested on synthetic and real data sets coming from various applications areas.
Extra information
Prerequisite
- a good mathematical background and some knowledge in computational geometry/topology and/or statistical learning.
- Some notions of C/C++ or Python would also be welcome.