Systematic analysis of ChatGPT, Google search and Llama 2 for clinical decision support tasks
Indexed incrossrefdoajpubmed
Abstract
It is likely that individuals are turning to Large Language Models (LLMs) to seek health advice, much like searching for diagnoses on Google. We evaluate clinical accuracy of GPT-3·5 and GPT-4 for suggesting initial diagnosis, examination steps and treatment of 110 medical cases across diverse clinical disciplines. Moreover, two model configurations of the Llama 2 open source LLMs are assessed in a sub-study. For benchmarking the diagnostic task, we conduct a naïve Google search for comparison. Overall, GPT-4 performed best with superior performances over GPT-3·5 considering diagnosis and examination and superior performance over Google for diagnosis. Except for treatment, better performance on frequent vs…
Citation impact
184
total citations
- FWCI
- 19.67
- Percentile
- 100%
- References
- 21
Citations per year
Authors
4Topics & keywords
Topics
Keywords
- Benchmarking
- Medical diagnosis
- Transparency (behavior)
- Health care
- Computer science
- Medicine
- Pathology
- Business
UN Sustainable Development Goals
- Peace, Justice and strong institutions
No related works found for this paper.