Tree-Based Batch Mode Reinforcement Learning
Abstract
Reinforcement learning aims to determine an optimal control policy from interaction with a system or from observations gathered from a system. In batch mode, it can be achieved by approximating the so-called Q-function based on a set of four-tuples (xt,ut,rt,xt+1) where xt denotes the system state at time t, ut the control action taken, rt the instantaneous reward obtained and xt+1 the successor state of the system, and by determining the control policy from this Q-function. The Q-function approximation may be obtained from the limit of a sequence of (batch mode) supervised learning problems. Within this framework we describe the use of several classical tree-based supervised learning methods (CART, Kd-tree,…
Citation impact
864
total citations
- FWCI
- 31.54
- Percentile
- 100%
- References
- 43
Citations per year
Authors
3Topics & keywords
Topics
Keywords
- Reinforcement learning
- Tree (set theory)
- Sequence (biology)
- Convergence (economics)
- Computer science
- Function (biology)
- Mathematics
- Tuple
UN Sustainable Development Goals
- Peace, Justice and strong institutions
No related works found for this paper.