NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models
The University of Adelaide · Australian National University
Abstract
Trained with an unprecedented scale of data, large language models (LLMs) like ChatGPT and GPT-4 exhibit the emergence of significant reasoning abilities from model scaling. Such a trend underscored the potential of training LLMs with unlimited language data, advancing the development of a universal embodied agent. In this work, we introduce the NavGPT, a purely LLM-based instruction-following navigation agent, to reveal the reasoning capability of GPT models in complex embodied scenes by performing zero-shot sequential action prediction for vision-and-language navigation (VLN). At each step, NavGPT takes the textual descriptions of visual observations, navigation history, and future explorable directions as…
Citation impact
- FWCI
- 17.85
- Percentile
- 100%
- References
- 89
Authors
3Topics & keywords
- Computer science
- Linguistics
- Cognitive science
- Artificial intelligence
- Psychology
- Natural language processing
- Philosophy
- Quality Education