Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature
California Institute of Technology · Howard Hughes Medical Institute
Abstract
We have developed Textpresso, a new text-mining system for scientific literature whose capabilities go far beyond those of a simple keyword search engine. Textpresso's two major elements are a collection of the full text of scientific articles split into individual sentences, and the implementation of categories of terms for which a database of articles and individual sentences can be searched. The categories are classes of biological concepts (e.g., gene, allele, cell or cell group, phenotype, etc.) and classes that relate two objects (e.g., association, regulation, etc.) or describe one (e.g., biological process, etc.). Together they form a catalog of types of objects and concepts called an ontology. After…
Citation impact
- FWCI
- 12.78
- Percentile
- 100%
- References
- 33
Authors
3Topics & keywords
- Ontology
- Information retrieval
- Computer science
- Sentence
- Natural language processing
- Quality Education