CORD-19: The COVID-19 Open Research Dataset
Allen Institute · Microsoft Research (United Kingdom) · +4 more institutions
Abstract
The Covid-19 Open Research Dataset (CORD-19) is a growing resource of scientific papers on Covid-19 and related historical coronavirus research. CORD-19 is designed to facilitate the development of text mining and information retrieval systems over its rich collection of metadata and structured full text papers. Since its release, CORD-19 has been downloaded over 200K times and has served as the basis of many Covid-19 text mining and discovery systems. In this article, we describe the mechanics of dataset construction, highlighting challenges and key design decisions, provide an overview of how CORD-19 has been used, and describe several shared tasks built around the dataset. We hope this resource will…
Citation impact
- FWCI
- —
- Percentile
- —
- References
- 86
Authors
28Topics & keywords
- Metadata
- Coronavirus disease 2019 (COVID-19)
- Computer science
- Resource (disambiguation)
- Data science
- Open research
- Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)
- 2019-20 coronavirus outbreak