articleMay 5, 2023GREEN OA

Large-Scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation

Mila - Quebec Artificial Intelligence Institute · University of California, San Diego

Indexed incrossref

Abstract

Contrastive learning has shown remarkable success in the field of multimodal representation learning. In this paper, we propose a pipeline of contrastive language-audio pretraining to develop an audio representation by combining audio data with natural language descriptions. To accomplish this target, we first release LAION-Audio-630K, a large collection of 633,526 audio-text pairs from different data sources. Second, we construct a contrastive language-audio pretraining model by considering different audio encoders and text encoders. We incorporate the feature fusion mechanism and keyword-to-caption augmentation into the model design to further enable the model to process audio inputs of variable lengths and…

Citation impact

376
total citations
FWCI
71.18
Percentile
100%
References
39
Citations per year

Authors

6

Topics & keywords

Keywords
  • Computer science
  • Audio mining
  • Natural language processing
  • Pipeline (software)
  • Speech recognition
  • Artificial intelligence
  • Encoder
  • Construct (python library)
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.