preprintarXiv (Cornell University)Jun 1, 2023GREEN OA

LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day

Indexed inarxivdatacite

Abstract

Conversational generative AI has demonstrated remarkable promise for empowering biomedical practitioners, but current investigations focus on unimodal text. Multimodal conversational AI has seen rapid progress by leveraging billions of image-text pairs from the public web, but such general-domain vision-language models still lack sophistication in understanding and conversing about biomedical images. In this paper, we propose a cost-efficient approach for training a vision-language conversational assistant that can answer open-ended research questions of biomedical images. The key idea is to leverage a large-scale, broad-coverage biomedical figure-caption dataset extracted from PubMed Central, use GPT-4 to…

Citation impact

224
total citations
FWCI
Percentile
References
0
Citations per year

Authors

9

Topics & keywords

Keywords
  • Computer science
  • Artificial intelligence
  • Biomedicine
  • Leverage (statistics)
  • Language understanding
  • Natural language processing
  • Vocabulary
  • Domain (mathematical analysis)
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.