Databases-Web-Information Retrieval-Reasoning

- Tracker item actions
- Imprimer
- Historique

Domaine: Databases-Web-Information Retrieval-Reasoning
Domain - extra: Natural Language Processing
Année: 2013
Starting: September 2013
État: Open
Sujet: Aggregation of Temporal Information from Texts
Thesis advisor: VILNAT Anne
Co-advisors: Xavier Tannier
Laboratory: LIMSI ILES
Collaborations
Abstract: Work is in progress at LIMSI (team ILES) on the analysis of verbal and nominal events in texts, as well as on temporal expression normalization. The purpose is to automatically build thematic chronologies. During this PhD we plan to go further this work and to achieve an automatic, fine-grained aggregation of temporal information. Given a large volume of texts, we aim at being able to produce a textual or structured synthesis on a specific topic. This topic will be provided by an end-user query.
A textual synthesis would be an aggregation of excerpts from several different documents, while a
structured synthesis would be, for example, a table recapitulating, in chronological order, events concerning a person or a topic.
Context: Within the domain of information extraction and document summarization, events and temporal
information are much less studied than other entities. Reasons for this low interest come from the complexity of such a task. Event expression is more subject to variations; for example, an event can be expressed by a verb, by a noun, or even by an adjective or a prepositional phrase. Furthermore, an event has many important arguments (involved persons, organizations or objects, etc.). On the one hand, traditional named entities can be considered as simple objects ("France", "Nicolas Sarkozy", "2007"); on the other hand, an event is rather a complex network of objects, linked together by an action ("the election of Nicolas Sarkozy in France in 2007").
Temporally-related expressions are also strongly subject to variation. For example,
relating an event that took place in 2010 can be achieved by using temporal expressions like "in
2010", "two years ago", "10 years after year 2000", ...
Objectives: Work is in progress at LIMSI (team ILES) on the analysis of verbal and nominal events in texts, as well as on temporal expression normalization. The purpose is to automatically build thematic chronologies. Given a topic defined by the user, and a large set of newswire articles, most important events concerning this topic are retrieved, ranked and presented to the user for validation. For example, if the topic is a name of person, the system will recall the most important events of his/her life.
During this PhD we plan to go further this work and to achieve an automatic, fine-grained
aggregation of temporal information. Given a large volume of texts, we aim at being able to produce a textual or structured synthesis on a specific topic. This topic will be provided by an end-user query.
A textual synthesis would be an aggregation of excerpts from several different documents, while a structured synthesis would be, e.g., a table recapitulating events concerning a person of a topic.
Work program: Achieving such a work requires to manipulate large collections of texts, in order to extract relevant information from them, but also to represent these texts in a good format. Furthermore, many clues will be used to perform the aggregation:
- by the use of deep linguistic analysis (temporal expressions, synonyms, paraphrases, named
entities)
- by the use of metadata such as the document creation time, associated keywords, etc.
Extra information
Prerequisite: The PhD candidate will then have important knowledge of natural language processing and machine
learning, as well as good skills in computer science.
Détails
Expected funding: Institutional funding
Status of funding: Expected
Candidates
Utilisateur: xavier.tannier
Créé: Mardi 11 juin 2013 11:27:04 CEST
dernière modif.: Mardi 11 juin 2013 11:27:04 CEST

Fichiers joints

	filename	créé	hits	filesize
Aucun fichier joint à cette fiche

Connexion

Ecole Doctorale Informatique Paris-Sud

Directrice
Nicole Bidoit
Assistante
Stéphanie Druetta
Conseiller aux thèses
Dominique Gouyou-Beauchamps

ED 427 - Université Paris-Sud
UFR Sciences Orsay
Bat 650 - aile nord - 417
Tel : 01 69 15 63 19
Fax : 01 69 15 63 87
courriel: ed-info à lri.fr