Covariant Policy Search (2003)
Tags
Markov Decision Processes, Optimization, Reinforcement Learning
Abstract
Abstract We investigate the problem of non-covariant behavior of policy gradient reinforcement learning algorithms. The policy gradient approach is amenable to analysis by information geometric methods. This leads us to propose a natural metric on controller parameterization that results from considering the manifold of probability distributions over paths induced by a stochastic controller. Investigation of this approach leads to a covariant gradient ascent rule. Interesting properties of this rule are discussed, including its relation with actor-critic style reinforcement learning algorithms. The algorithms discussed here are computationally quite efficient and on some interesting problems lead to dramatic performance improvement over non-covariant rules.
Full text
Download (application/pdf, 135.8 kB)
Approximate BibTeX Entry
@inproceedings{bagnellCovariant,
Month = {July},
Year = {2003},
Booktitle = {Proceedings of the International Joint Conference on Artificial Intelligence},
Author = {
Drew Bagnell, Jeff
Schneider
},
Title = {Covariant Policy Search}
}