In the context of managing massive amounts of heterogeneous and distributed data, data quality is a major challenge when producing and using such data. One major issue is the identification of multiple references to a single real-world object (e.g., a person, a product, etc.). This task, called entity resolution (ER), for instance allows to identify that M. Weis is the same person as Melanie Herschel.
In general, and for instance for Web data, we encounter data in graph structured form. So far, no efficient and practical solution for ER in massive graph data have been proposed. Additionally, no ER algorithm so far takes into account data provenance, meta-data describing the source of the data or the process that produced the data. However, this information may improve ER quality. Another optimization for ER quality that will be explored is the reversibility of a reconciliation decision, allowing to revoke this decision when contradictory information appears.
Context
This project is in line with the goals of the DigiCosme Labex (http://digicosme.lri.fr/). Research will take place at the Department of Computer Science of University Paris South (http://lri.fr), offering a dynamic and international research environment close to one of Europe's most beautiful metropoles. You will be a member of the joint University Database / INRIA Oak group (https://team.inria.fr/oak/) with a long and successful experience in managing massive amounts of heterogeneous data.
Objectives
The PhD project we propose aims at developing algorithms for entity resolution taking into account both data provenance and reversibility to improve entity resolution quality while being applicable to massive amounts of Web data.
Work program
Extra information
Prerequisite
Détails
Expected funding
Institutional funding
Status of funding
Expected
Candidates
Utilisateur
melanie.herschel
Créé
Mardi 19 mars 2013 11:30:22 CET
dernière modif.
Mardi 19 mars 2013 11:30:22 CET
Fichiers joints
filename
créé
hits
filesize
Aucun fichier joint à cette fiche
Connexion
Ecole Doctorale Informatique Paris-Sud
Directrice
Nicole Bidoit Assistante
Stéphanie Druetta Conseiller aux thèses
Dominique Gouyou-Beauchamps
ED 427 - Université Paris-Sud
UFR Sciences Orsay
Bat 650 - aile nord - 417
Tel : 01 69 15 63 19
Fax : 01 69 15 63 87
courriel: ed-info à lri.fr