articleJan 1, 2003Closed access

Nash Q-Learning for General-Sum Stochastic Games

JHJunling HuMPMichael P. WellmanCBCraig Boutilier

Abstract

We extend Q-learning to a noncooperative multiagent context, using the framework of general-sum stochastic games. A learning agent maintains Q-functions over joint actions, and performs updates based on assuming Nash equilibrium behavior over the current Q-values. This learning protocol provably converges given certain restrictions on the stage games (defined by Q-values) that arise during learning. Experiments with a pair of two-player grid games suggest that such restric-tions on the game structure are not necessarily required. Stage games encountered during learning in both grid environments violate the conditions. However, learning consistently converges in the first grid game, which has a unique…

Citation impact

915
total citations
FWCI
27.15
Percentile
100%
References
52
Citations per year

Authors

3
  • JH
    Junling HuCorresponding
  • MP
    Michael P. Wellman
  • CB
    Craig Boutilier

Topics & keywords

Keywords
  • Q-learning
  • Nash equilibrium
  • Computer science
  • Context (archaeology)
  • Mathematical economics
  • Epsilon-equilibrium
  • Best response
  • Preference learning
No related works found for this paper.