Can deepseek and ChatGPT be used in the diagnosis of oral pathologies?
Abstract
Artificial intelligence (AI) has been widely used in various medical fields to support diagnostic development. The development of different AI techniques has made important contributions to early diagnoses. This research compares and evaluates the diagnostic accuracy of ChatGPT-4o and Deepseek-v3 AI applications in 16 clinical case scenarios in oral pathologies. METHODOLOGY: Clinical case scenarios of 16 imaginary oral pathologies were prepared by the authors. The cases were asked to provide 3 possible preliminary diagnoses to two different AI applications, DeepSeek-V3 and ChatGPT-4o, and to reference the literature for these diagnoses. The diagnoses of both AI applications were evaluated with Likert scale by 20 different specialists from two different specialties.
The mean score for DeepSeek-v3 was 4.02 ± 0.36. For ChatGPT-4o it was 3.15 ± 0.41. According to the average scores, both models performed at a moderate to high level. Also, between the two AI models. DeepSeek-v3 was statistically better in 9 out of 16 clinical scenarios, while ChatGPT-4o was statistically better in 1 question. In general, DeepSeek-v3 was statistically more successful in the comparison of the two models (p = 0.024). In terms of references, ChatGPT-4o showed 62 references and 50 of them were fake, while 8 out of 48 references were fake in DeepSeek-v3.
Citation impact
- FWCI
- 25.25
- Percentile
- 100%
- References
- 36
Authors
2Topics & keywords
- Medicine
- Oral and maxillofacial surgery
- Dermatology
- General surgery
- Dentistry