MuChoMusic dataset
Universitat Pompeu Fabra · Queen Mary University of London · +1 more institution
Abstract
MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models MuChoMusic is a benchmark designed to evaluate music understanding in multimodal language models focused on audio. It includes 1,187 multiple-choice questions validated by human annotators, based on 644 music tracks from two publicly available music datasets. These questions cover a wide variety of genres and assess knowledge and reasoning across several musical concepts and their cultural and functional contexts. The benchmark provides a holistic evaluation of five open-source models, revealing challenges such as over-reliance on the language modality and highlighting the need for better multimodal integration. Note on Audio Files…
Citation impact
- FWCI
- —
- Percentile
- —
- References
- 0
Authors
13- AAAgostinelli, AndreaCorresponding
Universitat Pompeu Fabra
- DTDenk, Timo I.
Queen Mary University of London, Universal Music Group (United States)
- BZBorsos, Zalán
Queen Mary University of London
- EJEngel, Jesse
Universal Music Group (United States)
- VMVerzetti, Mauro
Queen Mary University of London
Topics & keywords
- Melody
- Guitar
- Violin
- Computer science
- Sequence (biology)
- Speech recognition
- Natural language processing
- Task (project management)