paratextJan 1, 2022Closed access
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Indexed incrossref
Abstract
Audio-visual (AV)-automatic speech recognition (ASR) can improve speech recognition accuracy by using lip images, especially in noisy environments.The recently proposed AV Align system integrates speech and image features based on a cross-modal attention mechanism, where attention weights for visual features are estimated by using acoustic features as queries.Although AV Align shows an improvement in recognition accuracy in background noise environments, we have observed that the recognition accuracy degrades significantly in interference speaker environments, where a target speech and an interfering speech overlap each other.In order to improve the speech recognition accuracy of the target speaker in such…
Citation impact
925
total citations
- FWCI
- —
- Percentile
- —
- References
- 22
Citations per year
Topics & keywords
Keywords
- Acoustics
- Signal processing
- SIGNAL (programming language)
- Computer science
- Speech recognition
- Telecommunications
- Physics
- Radar
No related works found for this paper.