articleJul 10, 2024Closed access

C-Pack: Packed Resources For General Chinese Embeddings

Beijing Academy of Social Sciences · Renmin University of China · +1 more institution

Indexed incrossref

Abstract

We introduce C-Pack, a package of resources that significantly advances the field of general text embeddings for Chinese. C-Pack includes three critical resources. 1) C-MTP is a massive training dataset for text embedding, which is based on the curation of vast unlabeled corpora and the integration of high-quality labeled corpora. 2) C-MTEB is a comprehensive benchmark for Chinese text embeddings covering 6 tasks and 35 datasets. 3) BGE is a family of embedding models covering multiple sizes. Our models outperform all prior Chinese text embeddings on C-MTEB by more than +10% upon the time of the release. We also integrate and optimize the entire suite of training methods for BGE. Along with our resources on…

Citation impact

253
total citations
FWCI
78.97
Percentile
100%
References
13
Citations per year

Authors

6

Topics & keywords

Keywords
  • Packed bed
  • Computer science
  • Chemistry
  • Chromatography
No related works found for this paper.