A systematic review and meta-analysis of diagnostic performance comparison between generative AI and physicians

Takita, Hirotaka; Kabata, Daijiro; Walston, Shannon L.; Tatekawa, Hiroyuki; Saito, Kenichi; Tsujimoto, Yasushi; Miki, Yukio; Ueda, Daiju

doi:10.1038/s41746-025-01543-z

reviewnpj Digital MedicineMar 22, 2025GOLD OA

A systematic review and meta-analysis of diagnostic performance comparison between generative AI and physicians

HTHirotaka Takita DKDaijiro Kabata SLShannon L. Walston HTHiroyuki Tatekawa KSKenichi Saito

Osaka Metropolitan University · Kobe University · +6 more institutions

PubMed

Indexed incrossrefdoajpubmed

Abstract

While generative artificial intelligence (AI) has shown potential in medical diagnostics, comprehensive evaluation of its diagnostic performance and comparison with physicians has not been extensively explored. We conducted a systematic review and meta-analysis of studies validating generative AI models for diagnostic tasks published between June 2018 and June 2024. Analysis of 83 studies revealed an overall diagnostic accuracy of 52.1%. No significant performance difference was found between AI models and physicians overall (p = 0.10) or non-expert physicians (p = 0.93). However, AI models performed significantly worse than expert physicians (p = 0.007). Several models demonstrated slightly higher performance…

Citation impact

111

total citations

FWCI: 53.56
Percentile: 100%
References: 120

Citations per year

Authors

8

Topics & keywords

Topics

Keywords

Meta-analysis
Generative grammar
Reliability (semiconductor)
Diagnostic accuracy
Systematic review
Artificial intelligence
Diagnostic test
Computer science

No related works found for this paper.