articleData in BriefJan 14, 2026GOLD OA

The Cadenza lyric intelligibility prediction (CLIP) dataset

University of Sheffield · University of Salford · +3 more institutions

PubMed
Indexed incrossrefdoajpubmed

Abstract

This paper presents CLIP, a dataset of 11,072 popular western music signals sourced from independent artists, accompanied by ground truth lyrics, and lyric intelligibility scores from listening tests. The dataset is designed to facilitate music information retrieval (MIR) research using machine learning. It was created to allow the development of algorithms to predict lyric intelligibility for the Cadenza ICASSP 2026 Signal Processing Grand Challenge. Currently, it is the only publicly available large-scale dataset for such a task. The music was sourced from the Free Music Archive (FMA) dataset and is unlikely to be familiar to listeners. We excluded tracks whose license did not allow derivative works and…

Citation impact

6
total citations
FWCI
204.41
Percentile
100%
References
6
Too recent for citation history.

Authors

11

Topics & keywords

Keywords
  • Intelligibility (philosophy)
  • Ground truth
  • Active listening
  • Music information retrieval
  • Common ground
  • German
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.

Funding