articleJan 1, 2003Closed access
Nash Q-Learning for General-Sum Stochastic Games
JHJunling HuMPMichael P. WellmanCBCraig Boutilier
Abstract
We extend Q-learning to a noncooperative multiagent context, using the framework of general-sum stochastic games. A learning agent maintains Q-functions over joint actions, and performs updates based on assuming Nash equilibrium behavior over the current Q-values. This learning protocol provably converges given certain restrictions on the stage games (defined by Q-values) that arise during learning. Experiments with a pair of two-player grid games suggest that such restric-tions on the game structure are not necessarily required. Stage games encountered during learning in both grid environments violate the conditions. However, learning consistently converges in the first grid game, which has a unique…
Citation impact
915
total citations
- FWCI
- 27.15
- Percentile
- 100%
- References
- 52
Citations per year
Authors
3- JHJunling HuCorresponding
- MPMichael P. Wellman
- CBCraig Boutilier
Topics & keywords
Topics
Keywords
- Q-learning
- Nash equilibrium
- Computer science
- Context (archaeology)
- Mathematical economics
- Epsilon-equilibrium
- Best response
- Preference learning
No related works found for this paper.