GENIA corpus—a semantically annotated corpus for bio-textmining

Kim, JD; Ohta, Tomoko; Tateisi, Yuka; Tsujii, J

doi:10.1093/bioinformatics/btg1023

articleBioinformaticsJul 3, 2003Closed access

GENIA corpus—a semantically annotated corpus for bio-textmining

JKJD Kim TOTomoko Ohta YTYuka Tateisi JTJ Tsujii

Japan Science and Technology Agency · Science and Technology Corporation (United States)

PubMed

Indexed incrossrefdoajpubmed

Abstract

Abstract Motivation: Natural language processing (NLP) methods are regarded as being useful to raise the potential of text mining from biological literature. The lack of an extensively annotated corpus of this literature, however, causes a major bottleneck for applying NLP techniques. GENIA corpus is being developed to provide reference materials to let NLP techniques work for bio-textmining. Results: GENIA corpus version 3.0 consisting of 2000 MEDLINE abstracts has been released with more than 400 000 words and almost 100 000 annotations for biological terms. Availability: GENIA corpus is freely available at http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA Keywords: Text Mining, Information Extraction, Corpus,…

Citation impact

1,241

total citations

FWCI: 14.20
Percentile: 100%
References: 2

Citations per year

Authors

4

Topics & keywords

Topics

Keywords

Computer science
Natural language processing
Bottleneck
Artificial intelligence
Annotation
Biomedical text mining
Information retrieval
Text mining

UN Sustainable Development Goals

Quality Education

No related works found for this paper.