Clinical entity augmented retrieval for clinical information extraction
Stanford Medicine · Stanford University · +3 more institutions
Abstract
Large language models (LLMs) with retrieval-augmented generation (RAG) have improved information extraction over previous methods, yet their reliance on embeddings often leads to inefficient retrieval. We introduce CLinical Entity Augmented Retrieval (CLEAR), a RAG pipeline that retrieves information using entities. We compared CLEAR to embedding RAG and full-note approaches for extracting 18 variables using six LLMs across 20,000 clinical notes. Average F1 scores were 0.90, 0.86, and 0.79; inference times were 4.95, 17.41, and 20.08 s per note; average model queries were 1.68, 4.94, and 4.18 per note; and average input tokens were 1.1k, 3.8k, and 6.1k per note for CLEAR, embedding RAG, and full-note…
Citation impact
- FWCI
- 27.11
- Percentile
- 100%
- References
- 69
Authors
12Topics & keywords
- Pipeline (software)
- Computer science
- Inference
- Embedding
- Security token
- Information retrieval
- Information extraction
- Artificial intelligence
- Quality Education
Funding
- GAGordon and Betty Moore FoundationAward: 12409
- AHAmerican Heart Association
- GCGeorgia Clinical and Translational Science AllianceAward: UL1TR003142
- NINational Institutes of HealthAwards: UG1DA015815, UL1TR003142
- NINational Institute of Allergy and Infectious DiseasesAward: 1R01AI17812101
- NCNational Center for Advancing Translational SciencesAward: UL1TR003142