COVID-Twitter-BERT: A natural language processing model to analyse COVID-19 content on Twitter
École Polytechnique Fédérale de Lausanne · Fundación para el Fomento de la Investigación Sanitaria y Biomédica de la Comunitat Valenciana
Abstract
This study presents COVID-Twitter-BERT (CT-BERT), a transformer-based model that is pre-trained on a large corpus of COVID-19 related Twitter messages. CT-BERT is specifically designed to be used on COVID-19 content, particularly from social media, and can be utilized for various natural language processing tasks such as classification, question-answering, and chatbots. This paper aims to evaluate the performance of CT-BERT on different classification datasets and compare it with BERT-LARGE, its base model.
The study utilizes CT-BERT, which is pre-trained on a large corpus of COVID-19 related Twitter messages. The authors evaluated the performance of CT-BERT on five different classification datasets, including one in the target domain. The model's performance is compared to its base model, BERT-LARGE, to measure the marginal improvement. The authors also provide detailed information on the training process and the technical specifications of the model.
Citation impact
- FWCI
- 80.90
- Percentile
- 100%
- References
- 16
Authors
3Topics & keywords
- Coronavirus disease 2019 (COVID-19)
- Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)
- 2019-20 coronavirus outbreak
- Content (measure theory)
- Computer science
- Pandemic
- Natural language processing
- Virology
- Quality Education