A Survey on Data Augmentation for Text Classification

Bayer, Markus; Kaufhold, Marc–André; Reuter, Christian

doi:10.1145/3544558

reviewACM Computing SurveysJun 17, 2022GREEN OA

A Survey on Data Augmentation for Text Classification

MBMarkus Bayer MKMarc–André Kaufhold CRChristian Reuter

Technische Universität Darmstadt

Indexed inarxivcrossrefdatacite

Abstract

Data augmentation, the artificial creation of training data for machine learning by transformations, is a widely studied research field across machine learning disciplines. While it is useful for increasing a model's generalization capabilities, it can also address many other challenges and problems, from overcoming a limited amount of training data to regularizing the objective, to limiting the amount of data used to protect privacy. Based on a precise description of the goals and applications of data augmentation and a taxonomy for existing works, this survey is concerned with data augmentation methods for textual classification and aims at providing a concise and comprehensive overview for researchers and…

Citation impact

396

total citations

FWCI: 51.33
Percentile: 100%
References: 203

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

Computer science
Taxonomy (biology)
Generalization
Limiting
Field (mathematics)
Data science
Training set
Artificial intelligence

UN Sustainable Development Goals

Quality Education

No related works found for this paper.