preprintarXiv (Cornell University)Mar 11, 2022GREEN OA

BERTopic: Neural topic modeling with a class-based TF-IDF procedure

Indexed inarxivdatacite

Abstract

Topic models can be useful tools to discover latent topics in collections of documents. Recent studies have shown the feasibility of approach topic modeling as a clustering task. We present BERTopic, a topic model that extends this process by extracting coherent topic representation through the development of a class-based variation of TF-IDF. More specifically, BERTopic generates document embedding with pre-trained transformer-based language models, clusters these embeddings, and finally, generates topic representations with the class-based TF-IDF procedure. BERTopic generates coherent topics and remains competitive across a variety of benchmarks involving classical models and those that follow the more…

Citation impact

1,308
total citations
FWCI
Percentile
References
0
Citations per year

Authors

1

Topics & keywords

Keywords
  • Computer science
  • Cluster analysis
  • Topic model
  • Transformer
  • Class (philosophy)
  • Artificial intelligence
  • Natural language processing
  • Document clustering
No related works found for this paper.