articleICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)Apr 27, 2022Closed access
Audioclip: Extending Clip to Image, Text and Audio
Indexed incrossref
Abstract
The rapidly evolving field of sound classification has greatly benefited from the methods of other domains. Today, the trend is to fuse domain-specific tasks and approaches together, which provides the community with new outstanding models.We present AudioCLIP – an extension of the CLIP model that handles audio in addition to text and images. Utilizing the AudioSet dataset, our proposed model incorporates the ESResNeXt audio-model into the CLIP framework, thus enabling it to perform multimodal classification and keeping CLIP’s zero-shot capabilities.AudioCLIP achieves new state-of-the-art results in the Environmental Sound Classification (ESC) task and out-performs others by reaching accuracies of 97.15 % on…
Citation impact
279
total citations
- FWCI
- 31.37
- Percentile
- 100%
- References
- 51
Citations per year
Authors
4Topics & keywords
Keywords
- Computer science
- Task (project management)
- Field (mathematics)
- Shot (pellet)
- Code (set theory)
- Fuse (electrical)
- Image (mathematics)
- Artificial intelligence
UN Sustainable Development Goals
- Sustainable cities and communities
No related works found for this paper.