Parallelism-High Performance Computing-Grid

Parallelism-High Performance Computing-Grid
Domain - extra
Data Linking, Privacy, Scalability
september or October
Online and Secure Data Linking
Thesis advisor
Nathalie Pernelle (LRI, Univ. Paris Sud) and Fatiha Saïs (LRI, Univ. Paris Sud)
- University of Milano-Bicocca – Italy
- INRA Montpellier - France
The increasing amount of data available in electronic form, in a multiple domains like health-care domain, agronomic domain, biological domain, constitutes a goldmine for several research topics. But like for most goldmines, its exploitation is tricky. Besides traditional domain specific challenges, researchers must address issues such as integrating heterogeneous data from multiple sources, guaranteeing the privacy of individuals, and handling enormous quantities of data. One of the most important data integration problems is the data linking problem which aims at finding identity links between different data items, i.e. determine which data items that refer to the same real world entity.

The main goal of this PhD project is to define and develop a data linking approach that can handle big amounts of data in a privacy sensitive context.

In a context where the number of data providers is in a continuous growth leading to a global data space of billions of data items, integrating and managing these data brings multiple problems related to data quality. Indeed, these data are very heterogeneous: it is incomplete, inconsistent, described according to different schemas and contains duplicates. Furthermore, in several application domains the privacy requirements can be very strict. Moreover, the amount of data that is available through social platforms is increasing more and more. In this context we will address the problem of data linking in an online platform, where data remain in their original data sources and where the privacy requirements are satisfied.
The main objective is to define and develop methods that deal with the problem of online data linking where most efficient queries have to be defined. The data-linking task should guarantee the satisfaction of privacy requirements. We will also be interested in enriching data coming from traditional data sources with data that will be extracted from social platforms using data linking methods. The approach will take benefits of the effort that has been done in semantic web and more specifically in ontology alignment field to deal with the problem of heterogeneity in the schemas of the data sources. The results of this PhD will be validated on real data in the epidemiological domain that will be provided by colleagues from INRA Montpellier and from a European partner (Univ. of Milano-Bicocca – Italy) in the settings of a submitted European project.
Work program
The objectives of this PhD project will be achieved in four steps:
1) Study the introduction of privacy requirements in data linking problem,
2) Define an online data linking approach that is privacy-aware,
3) Define optimization strategies to achieve the scalability of the approach.
4) Implementation and experimental validation of the approach.

Extra information
Knowledge Representation, Databases and Web technologies
Expected funding
Institutional funding
Status of funding
Maria Kutraki: Master 2 Internship in IASI Team.
Lundi 18 juin 2012 14:11:01 CEST
dernière modif.
Vendredi 13 juin 2014 18:15:27 CEST

Fichiers joints

Aucun fichier joint à cette fiche

Ecole Doctorale Informatique Paris-Sud

Nicole Bidoit
Stéphanie Druetta
Conseiller aux thèses
Dominique Gouyou-Beauchamps

ED 427 - Université Paris-Sud
UFR Sciences Orsay
Bat 650 - aile nord - 417
Tel : 01 69 15 63 19
Fax : 01 69 15 63 87
courriel: ed-info à