reviewFrontiers in Artificial IntelligenceNov 19, 2024GOLD OA

Vision-language models for medical report generation and visual question answering: a review

Moffitt Cancer Center

PubMed
Indexed incrossrefdoajpubmed

Abstract

Medical vision-language models (VLMs) combine computer vision (CV) and natural language processing (NLP) to analyze visual and textual medical data. Our paper reviews recent advancements in developing VLMs specialized for healthcare, focusing on publicly available models designed for medical report generation and visual question answering (VQA). We provide background on NLP and CV, explaining how techniques from both fields are integrated into VLMs, with visual and language data often fused using Transformer-based architectures to enable effective learning from multimodal data. Key areas we address include the exploration of 18 public medical vision-language datasets, in-depth analyses of the architectures and…

Citation impact

159
total citations
FWCI
35.54
Percentile
100%
References
220
Citations per year

Authors

2

Topics & keywords

Keywords
  • Computer science
  • Data science
  • Question answering
  • Artificial intelligence
  • Key (lock)
  • Machine learning
  • Human–computer interaction
  • Computer security
No related works found for this paper.

Funding