BERTopic: Neural topic modeling with a class-based TF-IDF procedure
Indexed inarxivdatacite
Abstract
Topic models can be useful tools to discover latent topics in collections of documents. Recent studies have shown the feasibility of approach topic modeling as a clustering task. We present BERTopic, a topic model that extends this process by extracting coherent topic representation through the development of a class-based variation of TF-IDF. More specifically, BERTopic generates document embedding with pre-trained transformer-based language models, clusters these embeddings, and finally, generates topic representations with the class-based TF-IDF procedure. BERTopic generates coherent topics and remains competitive across a variety of benchmarks involving classical models and those that follow the more…
Citation impact
1,308
total citations
- FWCI
- —
- Percentile
- —
- References
- 0
Citations per year
Authors
1Topics & keywords
Topics
Keywords
- Computer science
- Cluster analysis
- Topic model
- Transformer
- Class (philosophy)
- Artificial intelligence
- Natural language processing
- Document clustering
No related works found for this paper.