articleJun 1, 2023Closed access
Visual Programming: Compositional visual reasoning without training
Indexed incrossref
Abstract
We present Visprog, a neuro-symbolic approach to solving complex and compositional visual tasks given natural language instructions. Visprog avoids the need for any task-specific training. Instead, it uses the incontext learning ability of large language models to generate python-like modular programs, which are then executed to get both the solution and a comprehensive and interpretable rationale. Each line of the generated program may invoke one of several off-the-shelf computer vision models, image processing subroutines, or python functions to produce intermediate outputs that may be consumed by subsequent parts of the program. We demonstrate the flexibility of VIsPROG on 4 diverse tasks - compositional…
Citation impact
173
total citations
- FWCI
- 19.95
- Percentile
- 100%
- References
- 51
Citations per year
Authors
2Topics & keywords
Topics
Keywords
- Computer science
- Python (programming language)
- Programming language
- Artificial intelligence
- Subroutine
- Modular design
- Visual reasoning
- Visual programming language
UN Sustainable Development Goals
- Quality Education
No related works found for this paper.