NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models

Zhou, Gengze; Hong, Yicong; Wu, Qi

doi:10.1609/aaai.v38i7.28597

articleProceedings of the AAAI Conference on Artificial IntelligenceMar 24, 2024DIAMOND OA

NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models

GZGengze Zhou YHYicong Hong QWQi Wu

The University of Adelaide · Australian National University

Indexed incrossref

Abstract

Trained with an unprecedented scale of data, large language models (LLMs) like ChatGPT and GPT-4 exhibit the emergence of significant reasoning abilities from model scaling. Such a trend underscored the potential of training LLMs with unlimited language data, advancing the development of a universal embodied agent. In this work, we introduce the NavGPT, a purely LLM-based instruction-following navigation agent, to reveal the reasoning capability of GPT models in complex embodied scenes by performing zero-shot sequential action prediction for vision-and-language navigation (VLN). At each step, NavGPT takes the textual descriptions of visual observations, navigation history, and future explorable directions as…

Citation impact

136

total citations

FWCI: 17.85
Percentile: 100%
References: 89

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

Computer science
Linguistics
Cognitive science
Artificial intelligence
Psychology
Natural language processing
Philosophy

UN Sustainable Development Goals

Quality Education

No related works found for this paper.