Assessing the Accuracy of Responses by the Language Model ChatGPT to Questions Regarding Bariatric Surgery
Cedars-Sinai Medical Center · Keck Hospital of USC · +1 more institution
Abstract
Questions were gathered from nationally regarded professional societies and health institutions as well as Facebook support groups. Board-certified bariatric surgeons graded the accuracy and reproducibility of responses. The grading scale included the following: (1) comprehensive, (2) correct but inadequate, (3) some correct and some incorrect, and (4) completely incorrect. Reproducibility was determined by asking the model each question twice and examining difference in grading category between the two responses.
In total, 151 questions related to bariatric surgery were included. The model provided "comprehensive" responses to 131/151 (86.8%) of questions. When examined by category, the model provided "comprehensive" responses to 93.8% of questions related to "efficacy, eligibility and procedure options"; 93.3% related to "preoperative preparation"; 85.3% related to "recovery, risks, and complications"; 88.2% related to "lifestyle changes"; and 66.7% related to "other". The model provided reproducible answers to 137 (90.7%) of questions.
Citation impact
- FWCI
- 9.97
- Percentile
- 100%
- References
- 25
Authors
12Topics & keywords
- Medicine
- Grading (engineering)
- Leverage (statistics)
- Health care
- English language