Databases-Web-Information Retrieval-Reasoning
Natural Language Processing
September 2013
Aggregation of Temporal Information from Texts
Thesis advisor
Xavier Tannier
Work is in progress at LIMSI (team ILES) on the analysis of verbal and nominal events in texts, as well as on temporal expression normalization. The purpose is to automatically build thematic chronologies. During this PhD we plan to go further this work and to achieve an automatic, fine-grained aggregation of temporal information. Given a large volume of texts, we aim at being able to produce a textual or structured synthesis on a specific topic. This topic will be provided by an end-user query.
A textual synthesis would be an aggregation of excerpts from several different documents, while a
structured synthesis would be, for example, a table recapitulating, in chronological order, events concerning a person or a topic.
Within the domain of information extraction and document summarization, events and temporal
information are much less studied than other entities. Reasons for this low interest come from the complexity of such a task. Event expression is more subject to variations; for example, an event can be expressed by a verb, by a noun, or even by an adjective or a prepositional phrase. Furthermore, an event has many important arguments (involved persons, organizations or objects, etc.). On the one hand, traditional named entities can be considered as simple objects ("France", "Nicolas Sarkozy", "2007"); on the other hand, an event is rather a complex network of objects, linked together by an action ("the election of Nicolas Sarkozy in France in 2007").
Temporally-related expressions are also strongly subject to variation. For example,
relating an event that took place in 2010 can be achieved by using temporal expressions like "in
2010", "two years ago", "10 years after year 2000", ...
Work program
Achieving such a work requires to manipulate large collections of texts, in order to extract relevant information from them, but also to represent these texts in a good format. Furthermore, many clues will be used to perform the aggregation:
- by the use of deep linguistic analysis (temporal expressions, synonyms, paraphrases, named
- by the use of metadata such as the document creation time, associated keywords, etc.

The PhD candidate will then have important knowledge of natural language processing and machine
learning, as well as good skills in computer science.
Expected funding
Institutional funding
Status of funding
