articleJan 1, 2019GOLD OA

ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing

MNMark NeumannDKDaniel KingIBIz BeltagyWAWaleed Ammar
Indexed inarxivcrossref

Abstract

Despite recent advances in natural language processing, many statistical models for processing text perform extremely poorly under domain shift. Processing biomedical and clinical text is a critically important application area of natural language processing, for which there are few robust, practical, publicly available models. This paper describes scispaCy, a new tool for practical biomedical/scientific text processing, which heavily leverages the spaCy library. We detail the performance of two packages of models released in scispaCy and demonstrate their robustness on several tasks and datasets. Models and code are available at https://allenai.github.io/scispacy/

Citation impact

460
total citations
FWCI
13.93
Percentile
100%
References
0
Citations per year

Authors

4
  • MN
    Mark NeumannCorresponding
  • DK
    Daniel King
  • IB
    Iz Beltagy
  • WA
    Waleed Ammar

Topics & keywords

Keywords
  • Robustness (evolution)
  • Natural language
  • Language model
  • Domain (mathematical analysis)
  • Language identification
  • Natural language user interface
  • Natural (archaeology)
No related works found for this paper.