Foundation Models Defining a New Era in Vision: A Survey and Outlook
Mohamed bin Zayed University of Artificial Intelligence · Australian National University · +6 more institutions
Abstract
Vision systems that see and reason about the compositional nature of visual scenes are fundamental to understanding our world. The complex relations between objects and their locations, ambiguities, and variations in the real-world environment can be better described in human language, naturally governed by grammatical rules and other modalities such as audio and depth. The models learned to bridge the gap between such modalities and large-scale training data facilitate contextual reasoning, generalization, and prompt capabilities at test time. These models are referred to as foundation models. The output of such models can be modified through human-provided prompts without retraining, e.g., segmenting a…
Citation impact
- FWCI
- 338.98
- Percentile
- 100%
- References
- 199
Authors
8- MAMuhammad AwaisCorresponding
Mohamed bin Zayed University of Artificial Intelligence
- MNMuzammal Naseer
Australian National University, Georgia Institute of Technology, Khalifa University of Science and Technology
- SKSalman Khan
Australian National University, Mohamed bin Zayed University of Artificial Intelligence
- RMRao Muhammad Anwer
Mohamed bin Zayed University of Artificial Intelligence
- HCHisham Cholakkal
Mohamed bin Zayed University of Artificial Intelligence
Topics & keywords
- Computer science
- Artificial intelligence
- Interpretability
- Modalities
- Human–computer interaction
- Foundation (evidence)
- Field (mathematics)
- Vision science