MuChoMusic dataset

Andrea, Agostinelli,; I., Denk, Timo; Zalán, Borsos,; Jesse, Engel,; Mauro, Verzetti,; Antoine, Caillon,; Qingqing, Huang,; Aren, Jansen,; Adam, Roberts,; Marco, Tagliasacchi,; Matt, Sharifi,; Neil, Zeghidour,; Christian, Frank,

doi:10.48550/arxiv.2301.11325

preprintarXiv (Cornell University)Jan 26, 2023GREEN OA

MuChoMusic dataset

AAAgostinelli, AndreaDTDenk, Timo I.BZBorsos, ZalánEJEngel, JesseVMVerzetti, Mauro

Universitat Pompeu Fabra · Queen Mary University of London · +1 more institution

Indexed inarxivdatacite

Abstract

MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models MuChoMusic is a benchmark designed to evaluate music understanding in multimodal language models focused on audio. It includes 1,187 multiple-choice questions validated by human annotators, based on 644 music tracks from two publicly available music datasets. These questions cover a wide variety of genres and assess knowledge and reasoning across several musical concepts and their cultural and functional contexts. The benchmark provides a holistic evaluation of five open-source models, revealing challenges such as over-reliance on the language modality and highlighting the need for better multimodal integration. Note on Audio Files…

Citation impact

182

total citations

FWCI: —
Percentile: —
References: 0

Citations per year

Authors

13

AA
Agostinelli, AndreaCorresponding
Universitat Pompeu Fabra
DT
Denk, Timo I.
Queen Mary University of London, Universal Music Group (United States)
BZ
Borsos, Zalán
Queen Mary University of London
EJ
Engel, Jesse
Universal Music Group (United States)
VM
Verzetti, Mauro
Queen Mary University of London

Topics & keywords

Topics

Keywords

Melody
Guitar
Violin
Computer science
Sequence (biology)
Speech recognition
Natural language processing
Task (project management)

No related works found for this paper.