On-line Policy Improvement using Monte-Carlo Search

Tesauro, Gerald; Galperin, Gregory R.

doi:10.48550/arxiv.2501.05407

preprintarXiv (Cornell University)Jan 9, 2025GREEN OA

On-line Policy Improvement using Monte-Carlo Search

GTGerald Tesauro GRGregory R. Galperin

Massachusetts Institute of Technology

Indexed inarxivdatacite

Abstract

We present a Monte-Carlo simulation algorithm for real-time policy improvement of an adaptive controller. In the Monte-Carlo simulation, the long-term expected reward of each possible action is statistically measured, using the initial policy to make decisions in each step of the simulation. The action maximizing the measured expected reward is then taken, resulting in an improved policy. Our algorithm is easily parallelizable and has been implemented on the IBM SP1 and SP2 parallel-RISC supercomputers. We have obtained promising initial results in applying this algorithm to the domain of backgammon. Results are reported for a wide variety of initial policies, ranging from a random policy to TD-Gammon, an…

Citation impact

212

total citations

FWCI: —
Percentile: —
References: 8

Citations per year

Authors

2

Topics & keywords

Topics

Keywords

Monte Carlo method
Line (geometry)
Statistical physics
Computer science
Econometrics
Economics
Statistics
Physics

No related works found for this paper.