articleNatural Language EngineeringMay 19, 2005Closed access

The Penn Chinese TreeBank: Phrase structure annotation of a large corpus

University of Pennsylvania

Indexed incrossref

Abstract

With growing interest in Chinese Language Processing, numerous NLP tools (e.g., word segmenters, part-of-speech taggers, and parsers) for Chinese have been developed all over the world. However, since no large-scale bracketed corpora are available to the public, these tools are trained on corpora with different segmentation criteria, part-of-speech tagsets and bracketing guidelines, and therefore, comparisons are difficult. As a first step towards addressing this issue, we have been preparing a large bracketed corpus since late 1998. The first two installments of the corpus, 250 thousand words of data, fully segmented, POS-tagged and syntactically bracketed, have been released to the public via LDC (…

Citation impact

694
total citations
FWCI
24.34
Percentile
100%
References
35
Citations per year

Authors

4

Topics & keywords

Keywords
  • Treebank
  • Computer science
  • Annotation
  • Bracketing (phenomenology)
  • Natural language processing
  • Artificial intelligence
  • Parsing
  • Text segmentation
No related works found for this paper.