LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day
Indexed inarxivdatacite
Abstract
Conversational generative AI has demonstrated remarkable promise for empowering biomedical practitioners, but current investigations focus on unimodal text. Multimodal conversational AI has seen rapid progress by leveraging billions of image-text pairs from the public web, but such general-domain vision-language models still lack sophistication in understanding and conversing about biomedical images. In this paper, we propose a cost-efficient approach for training a vision-language conversational assistant that can answer open-ended research questions of biomedical images. The key idea is to leverage a large-scale, broad-coverage biomedical figure-caption dataset extracted from PubMed Central, use GPT-4 to…
Citation impact
224
total citations
- FWCI
- —
- Percentile
- —
- References
- 0
Citations per year
Authors
9Topics & keywords
Topics
Keywords
- Computer science
- Artificial intelligence
- Biomedicine
- Leverage (statistics)
- Language understanding
- Natural language processing
- Vocabulary
- Domain (mathematical analysis)
UN Sustainable Development Goals
- Quality Education
No related works found for this paper.