Mamba in Speech: Towards an Alternative to Self-Attention

XZXiangyu ZhangQZQiquan ZhangHLHexin LiuTXTianyi XiaoXQXinyuan Qian

UNSW Sydney · Nanyang Technological University · +2 more institutions

Indexed incrossref

Abstract

Transformer and its derivatives have achieved success in diverse tasks across computer vision, natural language processing, and speech processing. To reduce the complexity of computations within the multi-head self-attention mechanism in Transformer, Selective State Space Models (i.e., Mamba) were proposed as an alternative. Mamba exhibited its effectiveness in natural language processing and computer vision tasks, but its superiority has rarely been investigated in speech signal processing. This paper explores solutions for applying Mamba to speech processing by discussing two typical speech processing tasks: speech recognition, which requires semantic and sequential information, and speech enhancement, which…

Citation impact

51
total citations
FWCI
357.89
Percentile
100%
References
88
Citations per year

Authors

9

Topics & keywords

Keywords
  • Computer science
  • Speech recognition
  • Speech processing
  • Natural language processing
No related works found for this paper.