articleJan 1, 2024GOLD OA

M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation

Indexed incrossref

Abstract

In this paper, we introduce a new embedding model called M3-Embedding, which is distinguished for its versatility in Multi-Linguality, Multi-Functionality, and Multi-Granularity.It provides a uniform support for the semantic retrieval of more than 100 working languages.It can simultaneously accomplish the three common retrieval functionalities: dense retrieval, multi-vector retrieval, and sparse retrieval.Besides, it is also capable of processing inputs of different granularities, spanning from short sentences to long documents of up to 8,192 tokens.The effective training of M3-Embedding presents a series of technical contributions.Notably, we propose a novel self-knowledge distillation approach, where the…

Citation impact

390
total citations
FWCI
122.22
Percentile
100%
References
0
Citations per year

Authors

6

Topics & keywords

Keywords
  • Granularity
  • Embedding
  • Computer science
  • Distillation
  • Artificial intelligence
  • Chemistry
  • Chromatography
  • Programming language
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.

Funding