NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models

The University of Adelaide · Australian National University

Indexed incrossref

Abstract

Trained with an unprecedented scale of data, large language models (LLMs) like ChatGPT and GPT-4 exhibit the emergence of significant reasoning abilities from model scaling. Such a trend underscored the potential of training LLMs with unlimited language data, advancing the development of a universal embodied agent. In this work, we introduce the NavGPT, a purely LLM-based instruction-following navigation agent, to reveal the reasoning capability of GPT models in complex embodied scenes by performing zero-shot sequential action prediction for vision-and-language navigation (VLN). At each step, NavGPT takes the textual descriptions of visual observations, navigation history, and future explorable directions as…

Citation impact

136
total citations
FWCI
17.85
Percentile
100%
References
89
Citations per year

Authors

3

Topics & keywords

Keywords
  • Computer science
  • Linguistics
  • Cognitive science
  • Artificial intelligence
  • Psychology
  • Natural language processing
  • Philosophy
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.