articleJun 1, 2023Closed access

Visual Programming: Compositional visual reasoning without training

Allen Institute

Indexed incrossref

Abstract

We present Visprog, a neuro-symbolic approach to solving complex and compositional visual tasks given natural language instructions. Visprog avoids the need for any task-specific training. Instead, it uses the incontext learning ability of large language models to generate python-like modular programs, which are then executed to get both the solution and a comprehensive and interpretable rationale. Each line of the generated program may invoke one of several off-the-shelf computer vision models, image processing subroutines, or python functions to produce intermediate outputs that may be consumed by subsequent parts of the program. We demonstrate the flexibility of VIsPROG on 4 diverse tasks - compositional…

Citation impact

173
total citations
FWCI
19.95
Percentile
100%
References
51
Citations per year

Authors

2

Topics & keywords

Keywords
  • Computer science
  • Python (programming language)
  • Programming language
  • Artificial intelligence
  • Subroutine
  • Modular design
  • Visual reasoning
  • Visual programming language
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.