The proposed PhD research work consists of investigating efficient algorithms for expressive and efficient management of RDF data in a cloud context. We aim at a rich subset of the language, extending the fragment considered in the literature with the ability to model, query and update schemas, as well as the rich reasoning this entails, and with blank nodes, that are a form of incomplete information specific to RDF. On this language, the purpose of the PhD is to investigate parallel and distributed algorithms for querying and updating RDF. Parallelism and distribution are required in order to cope with very large volumes of RDF data. We will consider a parallel infrastructure such as offered by Hadoop, and extend over previous works by optimizing for a trade-off between on one hand, the costs associated to querying RDF in such a distributed platform, and on the other hand the costs for reasoning over data and knowledge distributed across the store.
Context
Work on cloud-based data management has started in Oak within our participation to the KIC EIT ICT Labs activities “Europa” (on cloud-based data management) and “DataBridges” (on data integration techniques for digital cities, in particular we worked on RDF processing). We are now preparing renewal proposals of these activities for 2013. An Inria engineer (ADT DistribWeb) has arrived in the team in October 2011 and will be helping us in our cloud-based platform development.
Objectives
Work program
Extra information
Prerequisite
Strong skills in Databases and Knowledge Representation & Reasoning