Chargement...
 

Databases-Web-Information Retrieval

Domaine
Databases-Web-Information Retrieval
Domain - extra
data cleaning - data integration - data provenance
Année
2012
Starting
September 2012
État
Open
Sujet
Web-scale reference reconciliation leveraging reversibility and data provenance
Thesis advisor
BIDOIT Nicole
Co-advisors
Mélanie Herschel (LRI-INRIA : BD/OAK) will be the main advisor
http://www.lri.fr/~herschel/research.html
Laboratory
Collaborations
Abstract
The problem of reference reconciliation has been studied extensively for relational data and more recently for hierarchical and graph data. Due to the tremendous amount of Web data, current solutions for graph duplicate detection do not apply. Scalability is not the only issue, the second crucial aspect is the quality of the result. To improve result quality, this project will investigate how reference reconciliation can benefit from data provenance, i.e., information about the source of the data and the processes that produced these data. Another optimization in this direction is the reversibility of reconciliations, which can revoke a reconciliation when conflicting information appears.

The goal of the doctoral project is to develop reference reconciliation algorithms that take into account provenance and reversibility while applying to large volumes of Web data.
Context
In managing massive heterogeneous and distributed data, producing and using these data raise the question of quality and reliability. In this context, we consider the problem of reference reconciliation, which aims at identifying different representations of a same real world object (person, product, etc..) among data from different sources. This problem, also known as entity resolution or duplicate detection is of high practical relevance in industry.
Objectives
The goal is to design algorithms for reference reconciliation that improve the state of the art in three aspects: (i) apply to large volumes of data, (ii) make use of data provenance, and (iii) study how reconciliation decisions can be revoked and how this affects the overall result in terms of efficiency and result quality. In our group, we thrive at developing practically relevant solutions, so the proposed solutions will be implemented and evaluated on real data.

Work program
The overall goal will be achieved following a four-step work program:

1) Inclusion of data provenance
2) Study of reversibility of reconciliations
3) Large scale graph reference reconciliation
4) Software implementation and evaluation
Extra information
http://www.lri.fr/~herschel/research.html
Prerequisite
Détails
Télécharger sujet-pageED_herschel.doc
Expected funding
Institutional funding
Status of funding
Expected
Candidates
Utilisateur
nicole.bidoit
Créé
Mardi 17 avril 2012 17:23:57 CEST
dernière modif.
Mardi 01 janvier 2013 17:15:46 CET

Fichiers joints

 filenamecrééhitsfilesize 
sujet-pageED_herschel.doc 17 Apr 2012 17:231735458.50 Kb


Ecole Doctorale Informatique Paris-Sud


Directrice
Nicole Bidoit
Assistante
Stéphanie Druetta
Conseiller aux thèses
Dominique Gouyou-Beauchamps

ED 427 - Université Paris-Sud
UFR Sciences Orsay
Bat 650 - aile nord - 417
Tel : 01 69 15 63 19
Fax : 01 69 15 63 87
courriel: ed-info à lri.fr