ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

doi:10.1109/icassp43922.2022

paratextJan 1, 2022Closed access

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Indexed incrossref

Abstract

Audio-visual (AV)-automatic speech recognition (ASR) can improve speech recognition accuracy by using lip images, especially in noisy environments.The recently proposed AV Align system integrates speech and image features based on a cross-modal attention mechanism, where attention weights for visual features are estimated by using acoustic features as queries.Although AV Align shows an improvement in recognition accuracy in background noise environments, we have observed that the recognition accuracy degrades significantly in interference speaker environments, where a target speech and an interfering speech overlap each other.In order to improve the speech recognition accuracy of the target speaker in such…

Citation impact

925

total citations

FWCI: —
Percentile: —
References: 22

Citations per year

Topics & keywords

Topics

Speech Recognition and Synthesis21%

Keywords

Acoustics
Signal processing
SIGNAL (programming language)
Computer science
Speech recognition
Telecommunications
Physics
Radar

No related works found for this paper.