Historique de fiche de formulaire

Visualiser la fiche de formulaire

Version

Date

Utilisateur

ID du Champ

Champ

Difference

21 mai 2012 11:51

melanie.herschel

181

Context

	This thesis topic is in the context of the Nautilus project ([http://nautilus-system.org\|http://nautilus-system.org], (HG11)) we are pursuing at the database group of Université Paris Sud. The goal of Nautilus is to support developers in developing, analyzing, debugging, fixing, testing, and evolving complex data transformations process by providing a suite of algorithms and tools to accompany the process. The work proposed here will contribute to the query analysis and debugging components of Nautilus.		This thesis topic is in the context of the Nautilus project ([http://nautilus-system.org\|http://nautilus-system.org], (HG11)) we are pursuing at the database group of Université Paris Sud. The goal of Nautilus is to support developers in developing, analyzing, debugging, fixing, testing, and evolving complex data transformations process by providing a suite of algorithms and tools to accompany the process. The work proposed here will contribute to the query analysis and debugging components of Nautilus.
		+	This project also fits in the work plan we propose within the KIC EIT ICT Labs Activity 2013 """"DataBridges"""", a renewal of the successful activity of prior years to be coordinated by Melanie Herschel.

225

Domain - extra

~~data provenance~~

Data Provenance

04 mai 2012 10:27

melanie.herschel

181

Context

This thesis topic is in the context of the Nautilus project ([http://nautilus-system.org|http://nautilus-system.org], [HG11]) we are pursuing at the database group of Université Paris Sud. The goal of Nautilus is to support developers in developing, analyzing, debugging, fixing, testing, and evolving complex data transformations process by providing a suite of algorithms and tools to accompany the process. The work proposed here will contribute to the query analysis and debugging components of Nautilus.

This thesis topic is in the context of the Nautilus project ([http://nautilus-system.org|http://nautilus-system.org], (HG11)) we are pursuing at the database group of Université Paris Sud. The goal of Nautilus is to support developers in developing, analyzing, debugging, fixing, testing, and evolving complex data transformations process by providing a suite of algorithms and tools to accompany the process. The work proposed here will contribute to the query analysis and debugging components of Nautilus.

182

Work program

	#Generated explanation types		#Generated explanation types
	#Complexity analysis for different types of transformations and explanations		#Complexity analysis for different types of transformations and explanations
-	#Develop / extend algorithms computing instance-based explanations, thus pursuing our work started with the Artemis algorithm [HH10].#Develop / extend algorithms computing query-based explanations such as [CJ09]	+	#Develop / extend algorithms computing instance-based explanations, thus pursuing our work started with the Artemis algorithm (HH10).#Develop / extend algorithms computing query-based explanations such as (CJ09)
	#Develop algorithms computing hybrid explanations (that unify query-based and instance-based explanations).		#Develop algorithms computing hybrid explanations (that unify query-based and instance-based explanations).
-	#Implement the proposed algorithms in Java as part of an Eclipse Plugin, the general framework chosen to implement Nautilus [HHT09].	+	#Implement the proposed algorithms in Java as part of an Eclipse Plugin, the general framework chosen to implement Nautilus (HHT09).
	#Evaluation of the proposed algorithms, including a comparative evaluation with existing approaches.		#Evaluation of the proposed algorithms, including a comparative evaluation with existing approaches.

185

Extra information

-	* [CJ09] A. Chapman, H.V. Jagadish. Why Not? In Proceedings of the Conference on the Management of Data (SIGMOD), 2009. * [HHT09] M. Herschel, M.A. Hernandez, W.C. Tan. Artemis: A System for Analyzing Missing-Answers. Proceedings of the VLDB Endowment, Volume 2, August 2009.* [HH10] M. Herschel, M.A. Hernandez. Explaining Missing Answers to SPJUA Queries. Proceedings of the VLDB Endowment, Volume 3, September 2010.	+	* (CJ09) A. Chapman, H.V. Jagadish. Why Not? In Proceedings of the Conference on the Management of Data (SIGMOD), 2009. * (HHT09) M. Herschel, M.A. Hernandez, W.C. Tan. Artemis: A System for Analyzing Missing-Answers. Proceedings of the VLDB Endowment, Volume 2, August 2009.* (HH10) M. Herschel, M.A. Hernandez. Explaining Missing Answers to SPJUA Queries. Proceedings of the VLDB Endowment, Volume 3, September 2010.
	* [HG11] M. Herschel, T. Grust. In Proceedings of the VLDB QDB Workshop, 201		* [HG11] M. Herschel, T. Grust. In Proceedings of the VLDB QDB Workshop, 201

04 mai 2012 10:24

melanie.herschel

179

Subject

Foundations and Algorithms to Compute the ~~Lineage~~ of Missing Data

Foundations and Algorithms to Compute the Provenance of Missing Data

181

Context

This thesis topic is in the context of the Nautilus project (http://nautilus-system.org) we are pursuing at the database group of Université Paris Sud. The goal of Nautilus is to support developers in developing, analyzing, debugging, fixing, testing, and evolving complex data transformations process by providing a suite of algorithms and tools to accompany the process. The work proposed here will contribute to the query analysis and debugging components of Nautilus.

182

Work program

	#Generated explanation types		#Generated explanation types
	#Complexity analysis for different types of transformations and explanations		#Complexity analysis for different types of transformations and explanations
-	#Develop / extend algorithms computing instance-based explanations, thus pursuing our work on the Artemis algorithm.#Develop / extend algorithms computing query-based ~~explanations.~~	+	#Develop / extend algorithms computing instance-based explanations, thus pursuing our work started with the Artemis algorithm [HH10].#Develop / extend algorithms computing query-based explanations such as [CJ09]
	#Develop algorithms computing hybrid explanations (that unify query-based and instance-based explanations).		#Develop algorithms computing hybrid explanations (that unify query-based and instance-based explanations).
-	#Implement the proposed algorithms in Java as part of an Eclipse Plugin, the general framework chosen to implement Nautilus.#Evaluation of the proposed algorithms, including a comparative evaluation with existing approaches.	+	#Implement the proposed algorithms in Java as part of an Eclipse Plugin, the general framework chosen to implement Nautilus [HHT09].#Evaluation of the proposed algorithms, including a comparative evaluation with existing approaches.

04 mai 2012 10:06

melanie.herschel

182

Work program

	#Generated explanation types		#Generated explanation types
	#Complexity analysis for different types of transformations and explanations		#Complexity analysis for different types of transformations and explanations
-	#Develop / extend algorithms computing ~~query~~-based explanations~~#Develop~~ ~~/ extend algorithms computing instance-based explanations~~#Develop / extend algorithms computing query-based explanations	+	#Develop / extend algorithms computing instance-based explanations, thus pursuing our work on the Artemis algorithm.#Develop / extend algorithms computing query-based explanations.
	#Develop algorithms computing hybrid explanations (that unify query-based and instance-based explanations).		#Develop algorithms computing hybrid explanations (that unify query-based and instance-based explanations).
-	#Implement the proposed algorithms in Java as part of an Eclipse Plugin.	+	#Implement the proposed algorithms in Java as part of an Eclipse Plugin, the general framework chosen to implement Nautilus.
	#Evaluation of the proposed algorithms, including a comparative evaluation with existing approaches.		#Evaluation of the proposed algorithms, including a comparative evaluation with existing approaches.

04 mai 2012 10:02

melanie.herschel

182

Work program

-	The work program consists of eight work packages, briefly outlined below~~:!Framework definition~~	+	The work program consists of eight work packages, briefly outlined below. Work packages 1 to 3 contribute to the framework development, 4 through 6 devise new algorithms to compute missing data provenance in form of explanations, and the goal associated to 7 and 8 is the experimental validation.
	#Supported SQL transformations		#Supported SQL transformations
	#Generated explanation types		#Generated explanation types
	#Complexity analysis for different types of transformations and explanations		#Complexity analysis for different types of transformations and explanations
-	!Algorithm development
	#Develop / extend algorithms computing query-based explanations		#Develop / extend algorithms computing query-based explanations
	#Develop / extend algorithms computing instance-based explanations		#Develop / extend algorithms computing instance-based explanations
	#Develop / extend algorithms computing query-based explanations		#Develop / extend algorithms computing query-based explanations
	#Develop algorithms computing hybrid explanations (that unify query-based and instance-based explanations).		#Develop algorithms computing hybrid explanations (that unify query-based and instance-based explanations).
-	~~!Experimental validation~~#~~Implementation of~~ the proposed algorithms in Java as part of an Eclipse Plugin.	+	#Implement the proposed algorithms in Java as part of an Eclipse Plugin.
	#Evaluation of the proposed algorithms, including a comparative evaluation with existing approaches.		#Evaluation of the proposed algorithms, including a comparative evaluation with existing approaches.

04 mai 2012 09:59

melanie.herschel

183

Objectives

-	The goal of this research is to show that the answer to the question why some data is missing from a data transformation's output can be answered for a significant fraction of SQL data transformations by computing the provenance of missing data in form of so called ''explanations''. This core hypothesis dictates the following goals:	+	*The goal of this research is to show that the answer to the question why some data is missing from a data transformation's output can be answered for a significant fraction of SQL data transformations by computing the provenance of missing data in form of so called ''explanations''. This core hypothesis dictates the following goals:
	*__ Development of a framework__ that (i) unifies the concept of existing and different representations of missing data provenance and (ii) analyzes and defines interesting properties of the input and output.		*__ Development of a framework__ that (i) unifies the concept of existing and different representations of missing data provenance and (ii) analyzes and defines interesting properties of the input and output.
	* __Definition of efficient and effective algorithms__ to compute missing data provenance complying to the proposed framework. We envision a new type of algorithm that computes explanations that unify the multiple different explanation types that exist today.		* __Definition of efficient and effective algorithms__ to compute missing data provenance complying to the proposed framework. We envision a new type of algorithm that computes explanations that unify the multiple different explanation types that exist today.
	*__Experimental validation__ of the proposed solutions to assess both the efficiency and the usability of the computed explanations for analyzing and debugging complex data transformations.		*__Experimental validation__ of the proposed solutions to assess both the efficiency and the usability of the computed explanations for analyzing and debugging complex data transformations.

04 mai 2012 09:52

melanie.herschel

183

Objectives

	The goal of this research is to show that the answer to the question why some data is missing from a data transformation's output can be answered for a significant fraction of SQL data transformations by computing the provenance of missing data in form of so called ''explanations''. This core hypothesis dictates the following goals:		The goal of this research is to show that the answer to the question why some data is missing from a data transformation's output can be answered for a significant fraction of SQL data transformations by computing the provenance of missing data in form of so called ''explanations''. This core hypothesis dictates the following goals:
-	-__ Development of a framework__ that (i) unifies the concept of existing and different representations of missing data provenance and (ii) analyzes and defines interesting properties of the input and output.- __Definition of efficient and effective algorithms__ to compute missing data provenance complying to the proposed framework. We envision a new type of algorithm that computes explanations that unify the multiple different explanation types that exist today. - __Experimental validation__ of the proposed solutions to assess both the efficiency and the usability of the computed explanations for analyzing and debugging complex data transformations.	+	__ Development of a framework__ that (i) unifies the concept of existing and different representations of missing data provenance and (ii) analyzes and defines interesting properties of the input and output. __Definition of efficient and effective algorithms__ to compute missing data provenance complying to the proposed framework. We envision a new type of algorithm that computes explanations that unify the multiple different explanation types that exist today. *__Experimental validation__ of the proposed solutions to assess both the efficiency and the usability of the computed explanations for analyzing and debugging complex data transformations.

04 mai 2012 09:51

melanie.herschel

180

Abstract

-	Complex data transformations appear in numerous applications, such as data warehousing, data integration, and data cleaning. With increasing transformation complexity, the complexity of developing and understanding these ~~transformation~~ increases as well. Data provenance techniques, which trace back transformation output data to the input data contributing to the output, can help in ~~understand~~ such complex data transformations by explaining how the output was produced. However, especially during transformation development, a crucial question is not only to explain existing output data, but also to explain why expected data is ''missing'' from the output.	+	Complex data transformations appear in numerous applications, such as data warehousing, data integration, and data cleaning. With increasing transformation complexity, the complexity of developing and understanding these transformations increases as well. Data provenance techniques, which trace back transformation output data to the input data contributing to the output, can help in understanding such complex data transformations by explaining how the output was produced. However, especially during transformation development, a crucial question is not only to explain existing output data, but also to explain why expected data is ''missing'' from the output.
	Our goal is to devise the theoretical foundation and to propose novel algorithms to automatically compute the data provenance of missing output data. The goal is to explain to a data transformation developer why he is not obtaining the desired output, based on data examples and intuitive transformation representations.		Our goal is to devise the theoretical foundation and to propose novel algorithms to automatically compute the data provenance of missing output data. The goal is to explain to a data transformation developer why he is not obtaining the desired output, based on data examples and intuitive transformation representations.

Connexion

Ecole Doctorale Informatique Paris-Sud

Directrice
Nicole Bidoit
Assistante
Stéphanie Druetta
Conseiller aux thèses
Dominique Gouyou-Beauchamps

ED 427 - Université Paris-Sud
UFR Sciences Orsay
Bat 650 - aile nord - 417
Tel : 01 69 15 63 19
Fax : 01 69 15 63 87
courriel: ed-info à lri.fr