Automated real-world data integration improves cancer outcome prediction
Memorial Sloan Kettering Cancer Center · Dana-Farber Cancer Institute · +2 more institutions
Abstract
The digitization of health records and growing availability of tumour DNA sequencing provide an opportunity to study the determinants of cancer outcomes with unprecedented richness. Patient data are often stored in unstructured text and siloed datasets. Here we combine natural language processing annotations1,2 with structured medication, patient-reported demographic, tumour registry and tumour genomic data from 24,950 patients at Memorial Sloan Kettering Cancer Center to generate a clinicogenomic, harmonized oncologic real-world dataset (MSK-CHORD). MSK-CHORD includes data for non-small-cell lung (n = 7,809), breast (n = 5,368), colorectal (n = 5,543), prostate (n = 3,211) and pancreatic (n = 3,109) cancers…
Citation impact
- FWCI
- 53.17
- Percentile
- 100%
- References
- 63
Authors
99Topics & keywords
- Cancer
- Outcome (game theory)
- Computer science
- Computational biology
- Artificial intelligence
- Internal medicine
- Medicine
- Biology