Recent advances in the automatic recognition of audiovisual speech
IBM (United States) · IBM Research - Thomas J. Watson Research Center · +3 more institutions
Abstract
Visual speech information from the speaker's mouth region has been successfully shown to improve noise robustness of automatic speech recognizers, thus promising to extend their usability in the human computer interface. In this paper, we review the main components of audiovisual automatic speech recognition (ASR) and present novel contributions in two main areas: first, the visual front-end design, based on a cascade of linear image transforms of an appropriate video region of interest, and subsequently, audiovisual speech integration. On the latter topic, we discuss new work on feature and decision fusion combination, the modeling of audiovisual speech asynchrony, and incorporating modality reliability…
Citation impact
- FWCI
- 26.99
- Percentile
- 100%
- References
- 174
Authors
5- GPG. PomianosCorresponding
IBM (United States)
- CNC. Neti
IBM (United States), IBM Research - Thomas J. Watson Research Center
- GGGuillaume Gravier
Institut national de recherche en sciences et technologies du numérique, Institut de Recherche en Informatique et Systèmes Aléatoires
- AGA. Garg
IBM (United States), IBM Research - Almaden
- ASAndrew Senior
IBM (United States), IBM Research - Thomas J. Watson Research Center
Topics & keywords
- Computer science
- Speech recognition
- Robustness (evolution)
- Vocabulary
- Modality (human–computer interaction)
- Artificial intelligence
- Peace, Justice and strong institutions