Reinforcement Learning and Approximate Dynamic Programming for Feedback Control

1949-, Lewis, Frank L.; 1963-, Liu, Derong

doi:10.1002/9781118453988

bookDec 17, 2012Closed access

Reinforcement Learning and Approximate Dynamic Programming for Feedback Control

LFLewis, Frank L. 1949-LDLiu, Derong 1963-

The University of Texas at Arlington · University of Illinois Chicago

Indexed incrossref

Abstract

In this chapter, we extend the ADP algorithm, dual heuristic programming (DHP), to include a “bootstrapping” parameter λ, analogous to that used in the reinforcement learning algorithm TD(λ). The resulting algorithm, which we call VGL(λ) for value-gradient learning, is proven to produce a weight update that can be equivalent to backpropagation through time (BPTT) applied to a greedy policy on a critic function. This provides a surprising connection between the two alternate methods of BPTT and DHP. Under certain smoothness conditions, VGL(λ=1) with a greedy policy acquires the strong convergence conditions of BPTT, while using a general function approximator for the critic. We show that this can lead to…

Citation impact

573

total citations

FWCI: 7.42
Percentile: 100%
References: 33

Citations per year

Authors

2

LF
Lewis, Frank L. 1949-Corresponding
The University of Texas at Arlington
LD
Liu, Derong 1963-
University of Illinois Chicago

Topics & keywords

Topics

Keywords

Reinforcement learning
Computer science
Feedback control
Reinforcement
Control (management)
Dynamic programming
Artificial intelligence
Control engineering

No related works found for this paper.