Chargement...
 

Natural Language Speech and Audio Processing

Domaine
Natural Language Speech and Audio Processing
Domain - extra
psycholinguistics, machine learning, corpus linguistics
Année
2010
Starting
corpus linguistics, machine learning, psycholinguistics
État
Open
Sujet
Automatic speech transcription error recovery using multiple knowledge sources
Thesis advisor
ADDA-DECKER Martine
Co-advisors
Lori Lamel, LIMSI/CNRS
Ioana Vasilescu, LIMSI/CNRS
Laboratory
EXT
Collaborations
RWTH Aachen University Germany,
KIT Karlsruhe Germany.
Abstract
Over the past decade, it has been firmly established that human listeners still significantly outperform machines on speech transcription tasks. Indeed, human native listeners generally do a very well job in handling many aspects of variation that are proper to speech, such as pronunciation variants, disfluencies, ungrammatical sentences, accents, noise and so forth. These observations are particularly true when large surrounding contexts (complete and long sentences) are available. However, ASR systems generally take their transcription decisions on relatively limited contexts (several words) and their handling of variation in speech still remains a big challenge for current automatic speech recognition (ASR) systems.
ASR being an enabling technology for a large variety of advanced potential applications, such as multi-media information access or speech-to-speech translation, the impact of ASR errors on their performances will also be investigated.
Context
The handling of variation in speech still remains a big challenge for current automatic speech recognition (ASR) systems. In particular, the handling of casual interactive speech
often results in high word error rates, which ask for specific error recovery strategies.
The rich experimental environment of the Franco-German Quaero project (with annual ASR evaluations in multiple languages), provides a unique testbed for a systematic study of ASR errors. The proposed parallel between human and machine errors is then highly innovative and may push both our fundamental knowledge about human speech processing as well as basic techniques for automatic speech processing and error recovery.

Objectives
The aim of the present proposal is to identify current obstacles that affect ASR performance, to propose a sound ASR error typology and to benchmark human vs ASR performances according to this typology, to design innovative mechanisms for error recovery, as well as to explore new solutions to improved spoken language modeling.
Work program
ASR errors need to be investigated according to at least three axes:
(i) perceptual experiments on selected materials to benchmark human performances.
(ii) proper names which produce errors which are further harmful to further processings, such as information access, translation or question answering (factors: frequency of occurrence, pronunciation variants, repetitions).
(iii) reduced pronunciations (modeling options: specific acoustic models, pronunciation dictionary).
Apply and evaluate the impact of different knowledge sources (named entities, POS, prosody, pronunciation variants).
Extra information
Prerequisite
Détails
Télécharger SujetTheseIV.pdf
Expected funding
Research contract
Status of funding
Expected
Candidates
Utilisateur
Créé
Vendredi 26 février 2010 23:19:21 CET
dernière modif.
Mardi 29 juin 2010 15:14:26 CEST

Fichiers joints

 filenamecrééhitsfilesize 
SujetTheseIV.pdf 26 Feb 2010 23:19177736.38 Kb


Ecole Doctorale Informatique Paris-Sud


Directrice
Nicole Bidoit
Assistante
Stéphanie Druetta
Conseiller aux thèses
Dominique Gouyou-Beauchamps

ED 427 - Université Paris-Sud
UFR Sciences Orsay
Bat 650 - aile nord - 417
Tel : 01 69 15 63 19
Fax : 01 69 15 63 87
courriel: ed-info à lri.fr