paratextJan 1, 2022Closed access

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Indexed incrossref

Abstract

Audio-visual (AV)-automatic speech recognition (ASR) can improve speech recognition accuracy by using lip images, especially in noisy environments.The recently proposed AV Align system integrates speech and image features based on a cross-modal attention mechanism, where attention weights for visual features are estimated by using acoustic features as queries.Although AV Align shows an improvement in recognition accuracy in background noise environments, we have observed that the recognition accuracy degrades significantly in interference speaker environments, where a target speech and an interfering speech overlap each other.In order to improve the speech recognition accuracy of the target speaker in such…

Citation impact

925
total citations
FWCI
Percentile
References
22
Citations per year

Topics & keywords

Keywords
  • Acoustics
  • Signal processing
  • SIGNAL (programming language)
  • Computer science
  • Speech recognition
  • Telecommunications
  • Physics
  • Radar
No related works found for this paper.