preprintJan 1, 2020GOLD OA

Stanza: A Python Natural Language Processing Toolkit for Many Human Languages

Stanford University

Indexed incrossref

Abstract

We introduce Sta n z a , an open-source Python natural language processing toolkit supporting 66 human languages. Compared to existing widely used toolkits, Sta n z a features a language-agnostic fully neural pipeline for text analysis, including tokenization, multiword token expansion, lemmatization, part-ofspeech and morphological feature tagging, dependency parsing, and named entity recognition. We have trained Sta n z a on a total of 112 datasets, including the Universal Dependencies treebanks and other multilingual corpora, and show that the same neural architecture generalizes well and achieves competitive performance on all languages tested. Additionally, Sta n z a includes a native Python interface to…

Citation impact

1,407
total citations
FWCI
132.88
Percentile
100%
References
22
Citations per year

Authors

5

Topics & keywords

Keywords
  • Computer science
  • Python (programming language)
  • Lemmatisation
  • Stanza
  • Natural language processing
  • Artificial intelligence
  • Lexical analysis
  • Programming language
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.

Funding