bookDec 17, 2012Closed access

Reinforcement Learning and Approximate Dynamic Programming for Feedback Control

LFLewis, Frank L. 1949-LDLiu, Derong 1963-

The University of Texas at Arlington · University of Illinois Chicago

Indexed incrossref

Abstract

In this chapter, we extend the ADP algorithm, dual heuristic programming (DHP), to include a “bootstrapping” parameter λ, analogous to that used in the reinforcement learning algorithm TD(λ). The resulting algorithm, which we call VGL(λ) for value-gradient learning, is proven to produce a weight update that can be equivalent to backpropagation through time (BPTT) applied to a greedy policy on a critic function. This provides a surprising connection between the two alternate methods of BPTT and DHP. Under certain smoothness conditions, VGL(λ=1) with a greedy policy acquires the strong convergence conditions of BPTT, while using a general function approximator for the critic. We show that this can lead to…

Citation impact

573
total citations
FWCI
7.42
Percentile
100%
References
33
Citations per year

Authors

2
  • LF
    Lewis, Frank L. 1949-Corresponding

    The University of Texas at Arlington

  • LD
    Liu, Derong 1963-

    University of Illinois Chicago

Topics & keywords

Keywords
  • Reinforcement learning
  • Computer science
  • Feedback control
  • Reinforcement
  • Control (management)
  • Dynamic programming
  • Artificial intelligence
  • Control engineering
No related works found for this paper.