preprintarXiv (Cornell University)Aug 6, 2023GREEN OA

Spanish Pre-trained BERT Model and Evaluation Data

Indexed inarxivdatacite

Abstract

The Spanish language is one of the top 5 spoken languages in the world. Nevertheless, finding resources to train or evaluate Spanish language models is not an easy task. In this paper we help bridge this gap by presenting a BERT-based language model pre-trained exclusively on Spanish data. As a second contribution, we also compiled several tasks specifically for the Spanish language in a single repository much in the spirit of the GLUE benchmark. By fine-tuning our pre-trained Spanish model, we obtain better results compared to other BERT-based models pre-trained on multilingual corpora for most of the tasks, even achieving a new state-of-the-art on some of them. We have publicly released our model, the…

Citation impact

336
total citations
FWCI
Percentile
References
0
Citations per year

Authors

6

Topics & keywords

Keywords
  • Computer science
  • Language model
  • Benchmark (surveying)
  • Task (project management)
  • Bridge (graph theory)
  • Natural language processing
  • Artificial intelligence
  • Training set
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.