Version |
Date |
Utilisateur |
ID du Champ |
Champ |
Difference |
9 |
21 mai 2012 11:51 |
melanie.herschel |
181 |
Context |
| This thesis topic is in the context of the Nautilus project ([http://nautilus-system.org|http://nautilus-system.org], (HG11)) we are pursuing at the database group of Université Paris Sud. The goal of Nautilus is to support developers in developing, analyzing, debugging, fixing, testing, and evolving complex data transformations process by providing a suite of algorithms and tools to accompany the process. The work proposed here will contribute to the query analysis and debugging components of Nautilus. | | This thesis topic is in the context of the Nautilus project ([http://nautilus-system.org|http://nautilus-system.org], (HG11)) we are pursuing at the database group of Université Paris Sud. The goal of Nautilus is to support developers in developing, analyzing, debugging, fixing, testing, and evolving complex data transformations process by providing a suite of algorithms and tools to accompany the process. The work proposed here will contribute to the query analysis and debugging components of Nautilus. |
| + | This project also fits in the work plan we propose within the KIC EIT ICT Labs Activity 2013 """"DataBridges"""", a renewal of the successful activity of prior years to be coordinated by Melanie Herschel. |
|
|
|
|
225 |
Domain - extra |
- | data provenance |
+ | Data Provenance |
|
8 |
04 mai 2012 10:27 |
melanie.herschel |
181 |
Context |
- | This thesis topic is in the context of the Nautilus project ([http://nautilus-system.org|http://nautilus-system.org], [HG11]) we are pursuing at the database group of Université Paris Sud. The goal of Nautilus is to support developers in developing, analyzing, debugging, fixing, testing, and evolving complex data transformations process by providing a suite of algorithms and tools to accompany the process. The work proposed here will contribute to the query analysis and debugging components of Nautilus. |
+ | This thesis topic is in the context of the Nautilus project ([http://nautilus-system.org|http://nautilus-system.org], (HG11)) we are pursuing at the database group of Université Paris Sud. The goal of Nautilus is to support developers in developing, analyzing, debugging, fixing, testing, and evolving complex data transformations process by providing a suite of algorithms and tools to accompany the process. The work proposed here will contribute to the query analysis and debugging components of Nautilus. |
|
|
|
|
182 |
Work program |
| #Generated explanation types | | #Generated explanation types |
| #Complexity analysis for different types of transformations and explanations | | #Complexity analysis for different types of transformations and explanations |
- | #Develop / extend algorithms computing instance-based explanations, thus pursuing our work started with the Artemis algorithm [HH10].#Develop / extend algorithms computing query-based explanations such as [CJ09] |
+ | #Develop / extend algorithms computing instance-based explanations, thus pursuing our work started with the Artemis algorithm (HH10).#Develop / extend algorithms computing query-based explanations such as (CJ09) |
| #Develop algorithms computing hybrid explanations (that unify query-based and instance-based explanations). | | #Develop algorithms computing hybrid explanations (that unify query-based and instance-based explanations). |
- | #Implement the proposed algorithms in Java as part of an Eclipse Plugin, the general framework chosen to implement Nautilus [HHT09]. |
+ | #Implement the proposed algorithms in Java as part of an Eclipse Plugin, the general framework chosen to implement Nautilus (HHT09). |
| #Evaluation of the proposed algorithms, including a comparative evaluation with existing approaches. | | #Evaluation of the proposed algorithms, including a comparative evaluation with existing approaches. |
|
|
|
|
185 |
Extra information |
- | * [CJ09] A. Chapman, H.V. Jagadish. Why Not? In Proceedings of the Conference on the Management of Data (SIGMOD), 2009. * [HHT09] M. Herschel, M.A. Hernandez, W.C. Tan. Artemis: A System for Analyzing Missing-Answers. Proceedings of the VLDB Endowment, Volume 2, August 2009.* [HH10] M. Herschel, M.A. Hernandez. Explaining Missing Answers to SPJUA Queries. Proceedings of the VLDB Endowment, Volume 3, September 2010. |
+ | * (CJ09) A. Chapman, H.V. Jagadish. Why Not? In Proceedings of the Conference on the Management of Data (SIGMOD), 2009. * (HHT09) M. Herschel, M.A. Hernandez, W.C. Tan. Artemis: A System for Analyzing Missing-Answers. Proceedings of the VLDB Endowment, Volume 2, August 2009.* (HH10) M. Herschel, M.A. Hernandez. Explaining Missing Answers to SPJUA Queries. Proceedings of the VLDB Endowment, Volume 3, September 2010. |
| * [HG11] M. Herschel, T. Grust. In Proceedings of the VLDB QDB Workshop, 201 | | * [HG11] M. Herschel, T. Grust. In Proceedings of the VLDB QDB Workshop, 201 |
|
6 |
04 mai 2012 10:24 |
melanie.herschel |
179 |
Subject |
- | Foundations and Algorithms to Compute the Lineage of Missing Data |
+ | Foundations and Algorithms to Compute the Provenance of Missing Data |
|
|
|
|
181 |
Context |
- | This thesis topic is in the context of the Nautilus project (http://nautilus-system.org) we are pursuing at the database group of Université Paris Sud. The goal of Nautilus is to support developers in developing, analyzing, debugging, fixing, testing, and evolving complex data transformations process by providing a suite of algorithms and tools to accompany the process. The work proposed here will contribute to the query analysis and debugging components of Nautilus. |
+ | This thesis topic is in the context of the Nautilus project ([http://nautilus-system.org|http://nautilus-system.org], [HG11]) we are pursuing at the database group of Université Paris Sud. The goal of Nautilus is to support developers in developing, analyzing, debugging, fixing, testing, and evolving complex data transformations process by providing a suite of algorithms and tools to accompany the process. The work proposed here will contribute to the query analysis and debugging components of Nautilus. |
|
|
|
|
182 |
Work program |
| #Generated explanation types | | #Generated explanation types |
| #Complexity analysis for different types of transformations and explanations | | #Complexity analysis for different types of transformations and explanations |
- | #Develop / extend algorithms computing instance-based explanations, thus pursuing our work on the Artemis algorithm.#Develop / extend algorithms computing query-based explanations. |
+ | #Develop / extend algorithms computing instance-based explanations, thus pursuing our work started with the Artemis algorithm [HH10].#Develop / extend algorithms computing query-based explanations such as [CJ09] |
| #Develop algorithms computing hybrid explanations (that unify query-based and instance-based explanations). | | #Develop algorithms computing hybrid explanations (that unify query-based and instance-based explanations). |
- | #Implement the proposed algorithms in Java as part of an Eclipse Plugin, the general framework chosen to implement Nautilus.#Evaluation of the proposed algorithms, including a comparative evaluation with existing approaches. |
+ | #Implement the proposed algorithms in Java as part of an Eclipse Plugin, the general framework chosen to implement Nautilus [HHT09].#Evaluation of the proposed algorithms, including a comparative evaluation with existing approaches. |
|
5 |
04 mai 2012 10:06 |
melanie.herschel |
182 |
Work program |
| #Generated explanation types | | #Generated explanation types |
| #Complexity analysis for different types of transformations and explanations | | #Complexity analysis for different types of transformations and explanations |
- | #Develop / extend algorithms computing query-based explanations#Develop / extend algorithms computing instance-based explanations#Develop / extend algorithms computing query-based explanations |
+ | #Develop / extend algorithms computing instance-based explanations, thus pursuing our work on the Artemis algorithm.#Develop / extend algorithms computing query-based explanations. |
| #Develop algorithms computing hybrid explanations (that unify query-based and instance-based explanations). | | #Develop algorithms computing hybrid explanations (that unify query-based and instance-based explanations). |
- | #Implement the proposed algorithms in Java as part of an Eclipse Plugin. |
+ | #Implement the proposed algorithms in Java as part of an Eclipse Plugin, the general framework chosen to implement Nautilus. |
| #Evaluation of the proposed algorithms, including a comparative evaluation with existing approaches. | | #Evaluation of the proposed algorithms, including a comparative evaluation with existing approaches. |
|
4 |
04 mai 2012 10:02 |
melanie.herschel |
182 |
Work program |
- | The work program consists of eight work packages, briefly outlined below:!Framework definition |
+ | The work program consists of eight work packages, briefly outlined below. Work packages 1 to 3 contribute to the framework development, 4 through 6 devise new algorithms to compute missing data provenance in form of explanations, and the goal associated to 7 and 8 is the experimental validation. |
| #Supported SQL transformations | | #Supported SQL transformations |
| #Generated explanation types | | #Generated explanation types |
| #Complexity analysis for different types of transformations and explanations | | #Complexity analysis for different types of transformations and explanations |
- | !Algorithm development | |
| #Develop / extend algorithms computing query-based explanations | | #Develop / extend algorithms computing query-based explanations |
| #Develop / extend algorithms computing instance-based explanations | | #Develop / extend algorithms computing instance-based explanations |
| #Develop / extend algorithms computing query-based explanations | | #Develop / extend algorithms computing query-based explanations |
| #Develop algorithms computing hybrid explanations (that unify query-based and instance-based explanations). | | #Develop algorithms computing hybrid explanations (that unify query-based and instance-based explanations). |
- | !Experimental validation#Implementation of the proposed algorithms in Java as part of an Eclipse Plugin. |
+ | #Implement the proposed algorithms in Java as part of an Eclipse Plugin. |
| #Evaluation of the proposed algorithms, including a comparative evaluation with existing approaches. | | #Evaluation of the proposed algorithms, including a comparative evaluation with existing approaches. |
|
3 |
04 mai 2012 09:59 |
melanie.herschel |
183 |
Objectives |
- | The goal of this research is to show that the answer to the question why some data is missing from a data transformation's output can be answered for a significant fraction of SQL data transformations by computing the provenance of missing data in form of so called ''explanations''. This core hypothesis dictates the following goals: |
+ | *The goal of this research is to show that the answer to the question why some data is missing from a data transformation's output can be answered for a significant fraction of SQL data transformations by computing the provenance of missing data in form of so called ''explanations''. This core hypothesis dictates the following goals: |
| *__ Development of a framework__ that (i) unifies the concept of existing and different representations of missing data provenance and (ii) analyzes and defines interesting properties of the input and output. | | *__ Development of a framework__ that (i) unifies the concept of existing and different representations of missing data provenance and (ii) analyzes and defines interesting properties of the input and output. |
| * __Definition of efficient and effective algorithms__ to compute missing data provenance complying to the proposed framework. We envision a new type of algorithm that computes explanations that unify the multiple different explanation types that exist today. | | * __Definition of efficient and effective algorithms__ to compute missing data provenance complying to the proposed framework. We envision a new type of algorithm that computes explanations that unify the multiple different explanation types that exist today. |
| *__Experimental validation__ of the proposed solutions to assess both the efficiency and the usability of the computed explanations for analyzing and debugging complex data transformations. | | *__Experimental validation__ of the proposed solutions to assess both the efficiency and the usability of the computed explanations for analyzing and debugging complex data transformations. |
|
2 |
04 mai 2012 09:52 |
melanie.herschel |
183 |
Objectives |
| The goal of this research is to show that the answer to the question why some data is missing from a data transformation's output can be answered for a significant fraction of SQL data transformations by computing the provenance of missing data in form of so called ''explanations''. This core hypothesis dictates the following goals: | | The goal of this research is to show that the answer to the question why some data is missing from a data transformation's output can be answered for a significant fraction of SQL data transformations by computing the provenance of missing data in form of so called ''explanations''. This core hypothesis dictates the following goals: |
- | -__ Development of a framework__ that (i) unifies the concept of existing and different representations of missing data provenance and (ii) analyzes and defines interesting properties of the input and output.- __Definition of efficient and effective algorithms__ to compute missing data provenance complying to the proposed framework. We envision a new type of algorithm that computes explanations that unify the multiple different explanation types that exist today. - __Experimental validation__ of the proposed solutions to assess both the efficiency and the usability of the computed explanations for analyzing and debugging complex data transformations. |
+ | *__ Development of a framework__ that (i) unifies the concept of existing and different representations of missing data provenance and (ii) analyzes and defines interesting properties of the input and output.* __Definition of efficient and effective algorithms__ to compute missing data provenance complying to the proposed framework. We envision a new type of algorithm that computes explanations that unify the multiple different explanation types that exist today. *__Experimental validation__ of the proposed solutions to assess both the efficiency and the usability of the computed explanations for analyzing and debugging complex data transformations. |
|
1 |
04 mai 2012 09:51 |
melanie.herschel |
180 |
Abstract |
- | Complex data transformations appear in numerous applications, such as data warehousing, data integration, and data cleaning. With increasing transformation complexity, the complexity of developing and understanding these transformation increases as well. Data provenance techniques, which trace back transformation output data to the input data contributing to the output, can help in understand such complex data transformations by explaining how the output was produced. However, especially during transformation development, a crucial question is not only to explain existing output data, but also to explain why expected data is ''missing'' from the output. |
+ | Complex data transformations appear in numerous applications, such as data warehousing, data integration, and data cleaning. With increasing transformation complexity, the complexity of developing and understanding these transformations increases as well. Data provenance techniques, which trace back transformation output data to the input data contributing to the output, can help in understanding such complex data transformations by explaining how the output was produced. However, especially during transformation development, a crucial question is not only to explain existing output data, but also to explain why expected data is ''missing'' from the output. |
| Our goal is to devise the theoretical foundation and to propose novel algorithms to automatically compute the data provenance of missing output data. The goal is to explain to a data transformation developer why he is not obtaining the desired output, based on data examples and intuitive transformation representations. | | Our goal is to devise the theoretical foundation and to propose novel algorithms to automatically compute the data provenance of missing output data. The goal is to explain to a data transformation developer why he is not obtaining the desired output, based on data examples and intuitive transformation representations. |
|