Natural Language Speech and Audio Processing

Natural Language Speech and Audio Processing
Domain - extra
Machine Learning, Computer Vision, Multimedia
Extraction and description of story arcs in TV series
Thesis advisor
GUINAUDEAU Camille ( /
This PhD offer is related to a joint project with IRIT (Toulouse, France) and CRP-GL (Luxembourg).
This is an exploratory project dealing with the extraction and visualization of the narrative structure of TV series considered as collections. To our knowledge, this complex problem has never been tackled before. As such, fundamental research will be carried out in this project and, in particular, via the proposed PhD thesis.

It is a multidisciplinary project built around three partners (French labs LIMSI-CNRS and IRIT and Luxembourgish lab CRP-GL) with complementary expertise: speech processing, video processing, natural language processing, multimodal fusion and information visualization.
The first ambitious objective consists in automatically extracting the narrative structure of TV series.
Current TV series are based on complex structures involving several ‘stories' (called story arcs) intertwined within the same episode. The purpose of a story arc is to move a character or a situation from one state to another: in other words, to effect change. Story arcs must be detected, deconstructed and tracked across episodes of each TV series.

The second objective consists in the detailed analysis of the structure of every detected story arc. The idea is to detect known narrative structures based on the emotional and linguistic information. Thus, in order to keep viewers in suspense, screen writers tend to reproduce classical narrative schemes (e.g. exposition, rising action, climax, falling action and denouement) that should be detected and aligned with each story arc (at episode level or at the scale of the whole TV series).
Work program
First, based on building blocks such as the collection temporal structure, characters, environment and dialogues, story arcs will be detected, deconstructed and tracked across episodes of each TV series. This complex problem will be addressed as an unsupervised multimodal clustering problem.

The second problem aims at performing a detailed analysis of the internal narrative structure of each detected story arc. The main idea is to map each story arc with classical narrative structure (e.g. Freytag’s exposition, rising action, climax, falling action and denouement). This is a previously unexplored task that we will address as a multimodal segmentation problem based on the fusion of emotional and linguistic clues. New multimodal fusion paradigms will be designed to narrow this huge semantic gap.

Extra information
This PhD offer is proposed by the Spoken Language Processing Group of the CNRS/LIMSI lab in Orsay (France).

Contact : GUINAUDEAU Camille ( /
  • Knowledge in machine learning and natural language processing
  • Expertise in computer vision and/or speech processing would be appreciated.
  • Skills in the Python programming language.

Expected funding
Research contract
Status of funding
Jeudi 06 mars 2014 12:06:11 CET
dernière modif.
Mercredi 11 juin 2014 10:56:11 CEST

Fichiers joints

Aucun fichier joint à cette fiche

Ecole Doctorale Informatique Paris-Sud

Nicole Bidoit
Stéphanie Druetta
Conseiller aux thèses
Dominique Gouyou-Beauchamps

ED 427 - Université Paris-Sud
UFR Sciences Orsay
Bat 650 - aile nord - 417
Tel : 01 69 15 63 19
Fax : 01 69 15 63 87
courriel: ed-info à