Deduplicating Training Data Makes Language Models Better

Brain (Germany) · Google (United States) · +1 more institution

Indexed incrossref

Abstract

Katherine Lee, Daphne Ippolito, Andrew Nystrom, Chiyuan Zhang, Douglas Eck, Chris Callison-Burch, Nicholas Carlini. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022.

Citation impact

252
total citations
FWCI
23.47
Percentile
100%
References
60
Citations per year

Authors

7

Topics & keywords

Keywords
  • Zhàng
  • Computer science
  • Volume (thermodynamics)
  • Computational linguistics
  • Natural language processing
  • Artificial intelligence
  • Linguistics
  • Library science
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.