articlenpj Digital MedicineJan 19, 2025GOLD OA

Clinical entity augmented retrieval for clinical information extraction

Stanford Medicine · Stanford University · +3 more institutions

PubMed
Indexed incrossrefdoajpubmed

Abstract

Large language models (LLMs) with retrieval-augmented generation (RAG) have improved information extraction over previous methods, yet their reliance on embeddings often leads to inefficient retrieval. We introduce CLinical Entity Augmented Retrieval (CLEAR), a RAG pipeline that retrieves information using entities. We compared CLEAR to embedding RAG and full-note approaches for extracting 18 variables using six LLMs across 20,000 clinical notes. Average F1 scores were 0.90, 0.86, and 0.79; inference times were 4.95, 17.41, and 20.08 s per note; average model queries were 1.68, 4.94, and 4.18 per note; and average input tokens were 1.1k, 3.8k, and 6.1k per note for CLEAR, embedding RAG, and full-note…

Citation impact

44
total citations
FWCI
27.11
Percentile
100%
References
69
Citations per year

Authors

12

Topics & keywords

Keywords
  • Pipeline (software)
  • Computer science
  • Inference
  • Embedding
  • Security token
  • Information retrieval
  • Information extraction
  • Artificial intelligence
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.

Funding