Development of a liver disease–specific large language model chat interface using retrieval-augmented generation
University of California, San Francisco
Abstract
We evaluated LiVersa's performance by conducting 2 rounds of testing. First, we compared LiVersa's outputs versus those of trainees from a previously published knowledge assessment. LiVersa answered all 10 questions correctly. Second, we asked 15 hepatologists to evaluate the outputs of 10 hepatology topic questions generated by LiVersa, OpenAI's ChatGPT 4, and Meta's Large Language Model Meta AI 2. LiVersa's outputs were more accurate but were rated less comprehensive and safe compared to those of ChatGPT 4.
In this demonstration, we built disease-specific and protected health information-compliant LLMs using RAG. While LiVersa demonstrated higher accuracy in answering questions related to hepatology, there were some deficiencies due to limitations set by the number of documents used for RAG. LiVersa will likely require further refinement before potential live deployment. The LiVersa prototype, however, is a proof of concept for utilizing RAG to customize LLMs for clinical use cases.
Citation impact
- FWCI
- 13.41
- Percentile
- 100%
- References
- 31
Authors
8Topics & keywords
- Hepatology
- Computer science
- Set (abstract data type)
- Medicine
- Information retrieval
- Internal medicine
- Programming language