Reinforcement Learning and Approximate Dynamic Programming for Feedback Control
The University of Texas at Arlington · University of Illinois Chicago
Abstract
In this chapter, we extend the ADP algorithm, dual heuristic programming (DHP), to include a “bootstrapping” parameter λ, analogous to that used in the reinforcement learning algorithm TD(λ). The resulting algorithm, which we call VGL(λ) for value-gradient learning, is proven to produce a weight update that can be equivalent to backpropagation through time (BPTT) applied to a greedy policy on a critic function. This provides a surprising connection between the two alternate methods of BPTT and DHP. Under certain smoothness conditions, VGL(λ=1) with a greedy policy acquires the strong convergence conditions of BPTT, while using a general function approximator for the critic. We show that this can lead to…
Citation impact
- FWCI
- 7.42
- Percentile
- 100%
- References
- 33
Authors
2- LFLewis, Frank L. 1949-Corresponding
The University of Texas at Arlington
- LDLiu, Derong 1963-
University of Illinois Chicago
Topics & keywords
- Reinforcement learning
- Computer science
- Feedback control
- Reinforcement
- Control (management)
- Dynamic programming
- Artificial intelligence
- Control engineering