Bioinformatics

Domaine: Bioinformatics
Domain - extra: Knowledge management and Algorithmics
Année: 2012
Starting: September
État: Open
Sujet: Ranking biological and biomedical data: algorithms and applications
Thesis advisor: DENISE Alain
Co-advisors: COHEN BOULAKIA Sarah (LRI and INRIA AMIB)
Laboratory: LRI Bioinfo
Collaborations: Ulf Leser (Professor for Knowledge Management in Bioinformatics at Humboldt Universität in Berlin)
The design and evaluation of the methods will be done in close collaboration with oncologists from the Institut Curie and the Children's Hospital of Philadelphia.
Abstract: The extremely rapid growth in the number of public biomedical data sources associated with the exponential growth of data available in these sources makes the research phase of biomedical information increasingly complex and tedious. Several query systems of biomedical sources are now available and used by physicians but none of them offers to rank the data obtained in response to a query. This is particularly problematic in a context where simple queries can return thousands of results.
Many classical ranking techniques can be considered for biomedical data, from very simple (eg, focus on the most recent data) to complex. The aim of this thesis is to propose a uniform framework able to represent and compare (conceptually and on real data) ranking methods and bring up new approaches for ranking large masses of biological data.
Context: Many classical ranking techniques can be considered for biomedical data, varying from simple (eg, most recent data first) to complex (e.g., techniques such as PageRank exploiting links between data) approaches. These algorithms exploit two key aspects of biological data: (i) linked to each other’s, forming an intricate network (graph-based algorithms) and (ii) plain-text file (text mining like algorithms). Ranking algorithms have then very different time and space complexities.

On the other hand, techniques able to calculate consensus rankings (aka median) have been proposed Fagin et al, PODS 2004, Ailon, Algorithmica 2010,Cohen-Boulakia et al, SSDBM 2011. They propose to generate a new ranking highlighting the common points of a set of rankings while minimizing their disagreements. The computation of the median of a set of m rankings is then known to be NP-hard when m>4. Several (exact and approximated) algorithms and heuristics have then been proposed.
Objectives: The aim of this thesis is to introduce a uniform framework able to represent and compare (conceptually and on real data) both classical and consensual ranking methods and bring up new approaches for ranking large masses of biological data.
Work program: (1) Formal study of the ranking algorithms. Criteria of comparison will be introduced to describe all algorithms in a uniform way. The complexity of each algorithm will be evaluated. As for NP-hard problems, assumptions and parameters causing the complexity will be underlined.

(2) Evaluation of the ranking algorithms. Data obtained by the various ranking algorithms will be compared to expected data and random data. Several metrics will be considered in collaboration with Prof. Ulf Leser.

(3) Reducing the search space (graph-topology). Study the size of the search space to be explored for the results to be still reliable while obtained in a reduced time and using a reduced space.

4) Mixing graphs-based and text mining-based techniques to rank biological data
This part of the thesis will introduce text-mining techniques into ranking algorithms to exploit the textual aspect of biological data, following research directions described in Cohen-Boulakia and Leser, ICDE 2011.
Extra information
Prerequisite: Knowledge of the structures of biological data sources and skills in algorithmics and in object-oriented programming are expected.
Détails: BioRanking.pdf
Expected funding: Institutional funding
Status of funding: Expected
Candidates: BRANCOTTE Bryan (étudiant M2 Recherche Informatique spécialité IAC, Orsay)
Utilisateur: alain.denise
Créé: Dimanche 03 juin 2012 16:44:41 CEST
dernière modif.: Lundi 11 juin 2012 21:39:20 CEST

Fichiers joints

	filename	créé	hits	filesize
	BioRanking.pdf	11 Jun 2012 21:37	1196	80.61 Kb

Connexion

Ecole Doctorale Informatique Paris-Sud

Directrice
Nicole Bidoit
Assistante
Stéphanie Druetta
Conseiller aux thèses
Dominique Gouyou-Beauchamps

ED 427 - Université Paris-Sud
UFR Sciences Orsay
Bat 650 - aile nord - 417
Tel : 01 69 15 63 19
Fax : 01 69 15 63 87
courriel: ed-info à lri.fr