Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies

Nagendran, Myura; Chen, Yang; Lovejoy, Christopher A.; Gordon, Anthony; Komorowski, Matthieu; Harvey, Hugh; Topol, Eric J.; Ioannidis, John P. A.; Collins, Gary S.; Maruthappu, Mahiben

doi:10.1136/bmj.m689

reviewBMJMar 25, 2020HYBRID OA

Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies

MNMyura Nagendran YCYang Chen CAChristopher A. Lovejoy AGAnthony Gordon MKMatthieu Komorowski

Imperial College London · University College London · +7 more institutions

PubMed

Indexed incrossrefpubmed

Abstract

Objective

To systematically examine the design, reporting standards, risk of bias, and claims of studies comparing the performance of diagnostic deep learning algorithms for medical imaging with that of expert clinicians.

Design

Systematic review. DATA SOURCES: Medline, Embase, Cochrane Central Register of Controlled Trials, and the World Health Organization trial registry from 2010 to June 2019. ELIGIBILITY CRITERIA FOR SELECTING STUDIES: Randomised trial registrations and non-randomised studies comparing the performance of a deep learning algorithm in medical imaging with a contemporary group of one or more expert clinicians. Medical imaging has seen a growing interest in deep learning research. The main distinguishing feature of convolutional neural networks (CNNs) in deep learning is that when CNNs are fed with raw data, they develop their own representations needed for pattern recognition. The algorithm learns for itself the features of an image that are important for classification rather than being told by humans which features to use. The selected studies aimed to use medical imaging for predicting absolute risk of existing disease or classification into diagnostic groups (eg, disease or non-disease). For example, raw chest radiographs tagged with a label such as pneumothorax or no pneumothorax and the CNN learning which pixel patterns suggest pneumothorax. REVIEW METHODS: Adherence to reporting standards was assessed by using CONSORT (consolidated standards of reporting trials) for randomised studies and TRIPOD (transparent reporting of a multivariable prediction model for individual prognosis or diagnosis) for non-randomised studies. Risk of bias was assessed by using the Cochrane risk of bias tool for randomised studies and PROBAST (prediction model risk of bias assessment tool) for non-randomised studies.

Citation impact

1,035

total citations

FWCI: 40.22
Percentile: 100%
References: 40

Citations per year

Authors

10

Topics & keywords

Topics

Keywords

Artificial intelligence
Medicine
MEDLINE
Deep learning
Machine learning
Systematic review
Convolutional neural network
Medical physics

No related works found for this paper.