MTEB: Massive Text Embedding Benchmark

Muennighoff, Niklas; Tazi, Nouamane; Magne, Loïc; Reimers, Nils

doi:10.18653/v1/2023.eacl-main.148

articleJan 1, 2023GOLD OA

MTEB: Massive Text Embedding Benchmark

NMNiklas Muennighoff NTNouamane Tazi LMLoïc Magne NRNils Reimers

Indexed incrossref

Abstract

Text embeddings are commonly evaluated on a small set of datasets from a single task not covering their possible applications to other tasks. It is unclear whether state-of-the-art embeddings on semantic textual similarity (STS) can be equally well applied to other tasks like clustering or reranking. This makes progress in the field difficult to track, as various models are constantly being proposed without proper evaluation. To solve this problem, we introduce the Massive Text Embedding Benchmark (MTEB). MTEB spans 8 embedding tasks covering a total of 58 datasets and 112 languages. Through the benchmarking of 33 models on MTEB, we establish the most comprehensive benchmark of text embeddings todate. We find…

Citation impact

354

total citations

FWCI: 58.65
Percentile: 100%
References: 71

Citations per year

Authors

4

Topics & keywords

Topics

Keywords

Embedding
Benchmark (surveying)
Benchmarking
Computer science
Similarity (geometry)
Set (abstract data type)
Task (project management)
Field (mathematics)

No related works found for this paper.