LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day

Li, Chunyuan; Wong, Cliff; Zhang, Sheng; Usuyama, Naoto; Liu, Haotian; Yang, Jianwei; Naumann, Tristan; Poon, Hoifung; Gao, Jianfeng

doi:10.48550/arxiv.2306.00890

preprintarXiv (Cornell University)Jun 1, 2023GREEN OA

LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day

CLChunyuan Li CWCliff Wong SZSheng Zhang NUNaoto Usuyama HLHaotian Liu

Indexed inarxivdatacite

Abstract

Conversational generative AI has demonstrated remarkable promise for empowering biomedical practitioners, but current investigations focus on unimodal text. Multimodal conversational AI has seen rapid progress by leveraging billions of image-text pairs from the public web, but such general-domain vision-language models still lack sophistication in understanding and conversing about biomedical images. In this paper, we propose a cost-efficient approach for training a vision-language conversational assistant that can answer open-ended research questions of biomedical images. The key idea is to leverage a large-scale, broad-coverage biomedical figure-caption dataset extracted from PubMed Central, use GPT-4 to…

Citation impact

224

total citations

FWCI: —
Percentile: —
References: 0

Citations per year

Authors

9

Topics & keywords

Topics

Keywords

Computer science
Artificial intelligence
Biomedicine
Leverage (statistics)
Language understanding
Natural language processing
Vocabulary
Domain (mathematical analysis)

UN Sustainable Development Goals

Quality Education

No related works found for this paper.