articlePLoS BiologySep 17, 2004GOLD OA

Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature

California Institute of Technology · Howard Hughes Medical Institute

PubMed
Indexed incrossrefdoajpubmed

Abstract

We have developed Textpresso, a new text-mining system for scientific literature whose capabilities go far beyond those of a simple keyword search engine. Textpresso's two major elements are a collection of the full text of scientific articles split into individual sentences, and the implementation of categories of terms for which a database of articles and individual sentences can be searched. The categories are classes of biological concepts (e.g., gene, allele, cell or cell group, phenotype, etc.) and classes that relate two objects (e.g., association, regulation, etc.) or describe one (e.g., biological process, etc.). Together they form a catalog of types of objects and concepts called an ontology. After…

No related works found for this paper.

Funding