Recent advances in the automatic recognition of audiovisual speech

Pomianos, G.; Neti, C.; Gravier, Guillaume; Garg, A.; Senior, Andrew

doi:10.1109/jproc.2003.817150

articleProceedings of the IEEESep 1, 2003Closed access

Recent advances in the automatic recognition of audiovisual speech

GPG. Pomianos CNC. Neti GGGuillaume Gravier AGA. Garg ASAndrew Senior

IBM (United States) · IBM Research - Thomas J. Watson Research Center · +3 more institutions

Indexed incrossref

Abstract

Visual speech information from the speaker's mouth region has been successfully shown to improve noise robustness of automatic speech recognizers, thus promising to extend their usability in the human computer interface. In this paper, we review the main components of audiovisual automatic speech recognition (ASR) and present novel contributions in two main areas: first, the visual front-end design, based on a cascade of linear image transforms of an appropriate video region of interest, and subsequently, audiovisual speech integration. On the latter topic, we discuss new work on feature and decision fusion combination, the modeling of audiovisual speech asynchrony, and incorporating modality reliability…

Citation impact

754

total citations

FWCI: 26.99
Percentile: 100%
References: 174

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

Computer science
Speech recognition
Robustness (evolution)
Vocabulary
Modality (human–computer interaction)
Artificial intelligence

UN Sustainable Development Goals

Peace, Justice and strong institutions

No related works found for this paper.