The Penn Chinese TreeBank: Phrase structure annotation of a large corpus

Xue, Nianwen; Xia, Fei; Chiou, Fu-Dong; Palmer, Martha

doi:10.1017/s135132490400364x

articleNatural Language EngineeringMay 19, 2005Closed access

The Penn Chinese TreeBank: Phrase structure annotation of a large corpus

NXNianwen Xue FXFei Xia FCFu-Dong Chiou MPMartha Palmer

University of Pennsylvania

Indexed incrossref

Abstract

With growing interest in Chinese Language Processing, numerous NLP tools (e.g., word segmenters, part-of-speech taggers, and parsers) for Chinese have been developed all over the world. However, since no large-scale bracketed corpora are available to the public, these tools are trained on corpora with different segmentation criteria, part-of-speech tagsets and bracketing guidelines, and therefore, comparisons are difficult. As a first step towards addressing this issue, we have been preparing a large bracketed corpus since late 1998. The first two installments of the corpus, 250 thousand words of data, fully segmented, POS-tagged and syntactically bracketed, have been released to the public via LDC (…

Citation impact

694

total citations

FWCI: 24.34
Percentile: 100%
References: 35

Citations per year

Authors

4

Topics & keywords

Topics

Keywords

Treebank
Computer science
Annotation
Bracketing (phenomenology)
Natural language processing
Artificial intelligence
Parsing
Text segmentation

No related works found for this paper.