preprintJan 1, 2020GOLD OA
Stanza: A Python Natural Language Processing Toolkit for Many Human Languages
Indexed incrossref
Abstract
We introduce Sta n z a , an open-source Python natural language processing toolkit supporting 66 human languages. Compared to existing widely used toolkits, Sta n z a features a language-agnostic fully neural pipeline for text analysis, including tokenization, multiword token expansion, lemmatization, part-ofspeech and morphological feature tagging, dependency parsing, and named entity recognition. We have trained Sta n z a on a total of 112 datasets, including the Universal Dependencies treebanks and other multilingual corpora, and show that the same neural architecture generalizes well and achieves competitive performance on all languages tested. Additionally, Sta n z a includes a native Python interface to…
Citation impact
1,407
total citations
- FWCI
- 132.88
- Percentile
- 100%
- References
- 22
Citations per year
Authors
5Topics & keywords
Topics
Keywords
- Computer science
- Python (programming language)
- Lemmatisation
- Stanza
- Natural language processing
- Artificial intelligence
- Lexical analysis
- Programming language
UN Sustainable Development Goals
- Quality Education
No related works found for this paper.