Nash Q-Learning for General-Sum Stochastic Games

Hu, Junling; Wellman, Michael P.; Boutilier, Craig

articleJan 1, 2003Closed access

Nash Q-Learning for General-Sum Stochastic Games

JHJunling HuMPMichael P. WellmanCBCraig Boutilier

Abstract

We extend Q-learning to a noncooperative multiagent context, using the framework of general-sum stochastic games. A learning agent maintains Q-functions over joint actions, and performs updates based on assuming Nash equilibrium behavior over the current Q-values. This learning protocol provably converges given certain restrictions on the stage games (defined by Q-values) that arise during learning. Experiments with a pair of two-player grid games suggest that such restric-tions on the game structure are not necessarily required. Stage games encountered during learning in both grid environments violate the conditions. However, learning consistently converges in the first grid game, which has a unique…

Citation impact

915

total citations

FWCI: 27.15
Percentile: 100%
References: 52

Citations per year

Authors

3

JH
Junling HuCorresponding
MP
Michael P. Wellman
CB
Craig Boutilier

Topics & keywords

Topics

Keywords

Q-learning
Nash equilibrium
Computer science
Context (archaeology)
Mathematical economics
Epsilon-equilibrium
Best response
Preference learning

No related works found for this paper.