Vision-Language Models for Vision Tasks: A Survey
Nanyang Technological University
Indexed incrossrefpubmed
Abstract
Most visual recognition studies rely heavily on crowd-labelled data in deep neural networks (DNNs) training, and they usually train a DNN for each single visual recognition task, leading to a laborious and time-consuming visual recognition paradigm. To address the two challenges, Vision-Language Models (VLMs) have been intensively investigated recently, which learns rich vision-language correlation from web-scale image-text pairs that are almost infinitely available on the Internet and enables zero-shot predictions on various visual recognition tasks with a single VLM. This paper provides a systematic review of visual language models for various visual recognition tasks, including: (1) the background that…
Citation impact
703
total citations
- FWCI
- 155.95
- Percentile
- 100%
- References
- 236
Citations per year
Authors
4Topics & keywords
Topics
Keywords
- Computer science
- Artificial intelligence
- Machine learning
- Categorization
- Task (project management)
- Benchmarking
- Task analysis
No related works found for this paper.