Chargement...
 

Databases-Web-Information Retrieval-Reasoning

Domaine
Databases-Web-Information Retrieval-Reasoning
Domain - extra
Natural Language Processing
Année
2013
Starting
September 2013
État
Open
Sujet
Aggregation of Temporal Information from Texts
Thesis advisor
VILNAT Anne
Co-advisors
Xavier Tannier
Laboratory
Collaborations
Abstract
Work is in progress at LIMSI (team ILES) on the analysis of verbal and nominal events in texts, as well as on temporal expression normalization. The purpose is to automatically build thematic chronologies. During this PhD we plan to go further this work and to achieve an automatic, fine-grained aggregation of temporal information. Given a large volume of texts, we aim at being able to produce a textual or structured synthesis on a specific topic. This topic will be provided by an end-user query.
A textual synthesis would be an aggregation of excerpts from several different documents, while a
structured synthesis would be, for example, a table recapitulating, in chronological order, events concerning a person or a topic.
Context
Within the domain of information extraction and document summarization, events and temporal
information are much less studied than other entities. Reasons for this low interest come from the complexity of such a task. Event expression is more subject to variations; for example, an event can be expressed by a verb, by a noun, or even by an adjective or a prepositional phrase. Furthermore, an event has many important arguments (involved persons, organizations or objects, etc.). On the one hand, traditional named entities can be considered as simple objects ("France", "Nicolas Sarkozy", "2007"); on the other hand, an event is rather a complex network of objects, linked together by an action ("the election of Nicolas Sarkozy in France in 2007").
Temporally-related expressions are also strongly subject to variation. For example,
relating an event that took place in 2010 can be achieved by using temporal expressions like "in
2010", "two years ago", "10 years after year 2000", ...
Objectives
Work is in progress at LIMSI (team ILES) on the analysis of verbal and nominal events in texts, as well as on temporal expression normalization. The purpose is to automatically build thematic chronologies. Given a topic defined by the user, and a large set of newswire articles, most important events concerning this topic are retrieved, ranked and presented to the user for validation. For example, if the topic is a name of person, the system will recall the most important events of his/her life.
During this PhD we plan to go further this work and to achieve an automatic, fine-grained
aggregation of temporal information. Given a large volume of texts, we aim at being able to produce a textual or structured synthesis on a specific topic. This topic will be provided by an end-user query.
A textual synthesis would be an aggregation of excerpts from several different documents, while a structured synthesis would be, e.g., a table recapitulating events concerning a person of a topic.
Work program
Achieving such a work requires to manipulate large collections of texts, in order to extract relevant information from them, but also to represent these texts in a good format. Furthermore, many clues will be used to perform the aggregation:
- by the use of deep linguistic analysis (temporal expressions, synonyms, paraphrases, named
entities)
- by the use of metadata such as the document creation time, associated keywords, etc.

Extra information
Prerequisite
The PhD candidate will then have important knowledge of natural language processing and machine
learning, as well as good skills in computer science.
Détails
Expected funding
Institutional funding
Status of funding
Expected
Candidates
Utilisateur
xavier.tannier
Créé
Mardi 11 juin 2013 11:27:04 CEST
dernière modif.
Mardi 11 juin 2013 11:27:04 CEST

Fichiers joints

 filenamecrééhitsfilesize 
Aucun fichier joint à cette fiche


Ecole Doctorale Informatique Paris-Sud


Directrice
Nicole Bidoit
Assistante
Stéphanie Druetta
Conseiller aux thèses
Dominique Gouyou-Beauchamps

ED 427 - Université Paris-Sud
UFR Sciences Orsay
Bat 650 - aile nord - 417
Tel : 01 69 15 63 19
Fax : 01 69 15 63 87
courriel: ed-info à lri.fr