preprintarXiv (Cornell University)Jan 26, 2023GREEN OA

MuChoMusic dataset

AAAgostinelli, AndreaDTDenk, Timo I.BZBorsos, ZalánEJEngel, JesseVMVerzetti, Mauro

Universitat Pompeu Fabra · Queen Mary University of London · +1 more institution

Indexed inarxivdatacite

Abstract

MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models MuChoMusic is a benchmark designed to evaluate music understanding in multimodal language models focused on audio. It includes 1,187 multiple-choice questions validated by human annotators, based on 644 music tracks from two publicly available music datasets. These questions cover a wide variety of genres and assess knowledge and reasoning across several musical concepts and their cultural and functional contexts. The benchmark provides a holistic evaluation of five open-source models, revealing challenges such as over-reliance on the language modality and highlighting the need for better multimodal integration. Note on Audio Files…

Citation impact

182
total citations
FWCI
Percentile
References
0
Citations per year

Authors

13
  • AA
    Agostinelli, AndreaCorresponding

    Universitat Pompeu Fabra

  • DT
    Denk, Timo I.

    Queen Mary University of London, Universal Music Group (United States)

  • BZ
    Borsos, Zalán

    Queen Mary University of London

  • EJ
    Engel, Jesse

    Universal Music Group (United States)

  • VM
    Verzetti, Mauro

    Queen Mary University of London

Topics & keywords

Keywords
  • Melody
  • Guitar
  • Violin
  • Computer science
  • Sequence (biology)
  • Speech recognition
  • Natural language processing
  • Task (project management)
No related works found for this paper.