Data drift in medical machine learning: implications and potential remedies
United States Food and Drug Administration · Center for Devices and Radiological Health
Abstract
Data drift refers to differences between the data used in training a machine learning (ML) model and that applied to the model in real-world operation. Medical ML systems can be exposed to various forms of data drift, including differences between the data sampled for training and used in clinical operation, differences between medical practices or context of use between training and clinical use, and time-related changes in patient populations, disease patterns, and data acquisition, to name a few. In this article, we first review the terminology used in ML literature related to data drift, define distinct types of drift, and discuss in detail potential causes within the context of medical applications with…
Citation impact
- FWCI
- 31.60
- Percentile
- 100%
- References
- 83
Authors
4- BSBerkman SahinerCorresponding
United States Food and Drug Administration, Center for Devices and Radiological Health
- WCWeijie Chen
United States Food and Drug Administration, Center for Devices and Radiological Health
- RKRavi K. Samala
United States Food and Drug Administration, Center for Devices and Radiological Health
- NPNicholas Petrick
United States Food and Drug Administration, Center for Devices and Radiological Health
Topics & keywords
- Concept drift
- Context (archaeology)
- Computer science
- Retraining
- Software deployment
- Terminology
- Machine learning
- Artificial intelligence