Multimodal large language models for oral lesion diagnosis: a systematic review of diagnostic performance and clinical utility
International University · Khalifa University of Science and Technology · +6 more institutions
Abstract
Diagnosing oral lesions from benign conditions to oral cancer remains challenging due to overlapping visual features and reliance on histopathology. Large language models (LLMs) can integrate textual and visual cues, but their diagnostic accuracy and clinical utility in real decision-making contexts remain uncertain. To systematically evaluate the diagnostic performance, clinical usefulness, and limitations of LLMs in identifying oral lesions.
PubMed, CINAHL, Embase, Web of Science, and Google Scholar were searched to 20 July 2025. Eligible studies applied LLMs (e.g., ChatGPT, Gemini, DeepSeek, Copilot, Claude) for diagnosis or differential diagnosis of oral lesions using text, images, or multimodal inputs. Outcomes included diagnostic accuracy, agreement metrics, and qualitative assessments of explanation quality and clinical applicability. Risk of bias was assessed using an adapted QUADAS-2. Narrative synthesis was performed due to heterogeneity.
Citation impact
- FWCI
- 97.40
- Percentile
- 99%
- References
- 38
Authors
6- FEFatma E. A. HassaneinCorresponding
International University
- MAMalik Alkabazi
Khalifa University of Science and Technology
- MTMelek Taşsöker
Necmettin Erbakan University
- YAYousra Ahmed
Institute of Informatics Problems
- SASuliman Alsaeed
King Saud bin Abdulaziz University for Health Sciences, King Abdullah International Medical Research Center, National Guard Health Affairs
Topics & keywords
- Differential diagnosis
- Lesion
- MEDLINE
- Narrative review
- Interpretation (philosophy)
- Meta-analysis
- Peace, Justice and strong institutions