The Penn Chinese TreeBank: Phrase structure annotation of a large corpus
Indexed incrossref
Abstract
With growing interest in Chinese Language Processing, numerous NLP tools (e.g., word segmenters, part-of-speech taggers, and parsers) for Chinese have been developed all over the world. However, since no large-scale bracketed corpora are available to the public, these tools are trained on corpora with different segmentation criteria, part-of-speech tagsets and bracketing guidelines, and therefore, comparisons are difficult. As a first step towards addressing this issue, we have been preparing a large bracketed corpus since late 1998. The first two installments of the corpus, 250 thousand words of data, fully segmented, POS-tagged and syntactically bracketed, have been released to the public via LDC (…
Citation impact
694
total citations
- FWCI
- 24.34
- Percentile
- 100%
- References
- 35
Citations per year
Authors
4Topics & keywords
Keywords
- Treebank
- Computer science
- Annotation
- Bracketing (phenomenology)
- Natural language processing
- Artificial intelligence
- Parsing
- Text segmentation
No related works found for this paper.