NCBI disease corpus: A resource for disease name recognition and concept normalization
National Center for Biotechnology Information · National Institutes of Health · +1 more institution
Abstract
Information encoded in natural language in biomedical literature publications is only useful if efficient and reliable ways of accessing and analyzing that information are available. Natural language processing and text mining tools are therefore essential for extracting valuable information, however, the development of powerful, highly effective tools to automatically detect central biomedical concepts such as diseases is conditional on the availability of annotated corpora. This paper presents the disease name and concept annotations of the NCBI disease corpus, a collection of 793 PubMed abstracts fully annotated at the mention and concept level to serve as a research resource for the biomedical natural…
Citation impact
- FWCI
- 13.09
- Percentile
- 100%
- References
- 50
Authors
3- RIRezarta Islamaj
National Center for Biotechnology Information, National Institutes of Health
- RLRobert Leaman
National Institutes of Health, National Center for Biotechnology Information, Arizona State University
- ZLZhiyong LuCorresponding
National Center for Biotechnology Information, National Institutes of Health
Topics & keywords
- Computer science
- Identifier
- Annotation
- Natural language processing
- Information retrieval
- Unique identifier
- Resource (disambiguation)
- Named-entity recognition
- Quality Education