In this paper, we address infinite-horizon Linear Quadratic Regulator (LQR) problems for unknown discrete- time systems. As an additional challenge, we address an on- policy setup in which system matrices are identified while controlling the real system with a progressively optimized policy. Specifically, we consider a time-varying control policy that, while applied to the real unknown system, is iteratively refined (based on the most updated estimate of the system matrices) towards the optimal LQR solution. The overall learning procedure combines a recursive least squares method with a direct policy search based on the gradient method. By resorting to Lyapunov-based analysis tools in combination with averaging theory for nonlinear systems, exponential stability for the closed-loop scheme can be proven. Finally, a numerical example showing the effectiveness of the considered strategy corroborates the theoretical findings.
Sforni L., Carnevale G., Notarnicola I., Notarstefano G. (2023). On-Policy Data-Driven Linear Quadratic Regulator via Combined Policy Iteration and Recursive Least Squares. Institute of Electrical and Electronics Engineers Inc. [10.1109/CDC49753.2023.10383604].
On-Policy Data-Driven Linear Quadratic Regulator via Combined Policy Iteration and Recursive Least Squares
Sforni L.Primo
;Carnevale G.Secondo
;Notarnicola I.Penultimo
;Notarstefano G.Ultimo
2023
Abstract
In this paper, we address infinite-horizon Linear Quadratic Regulator (LQR) problems for unknown discrete- time systems. As an additional challenge, we address an on- policy setup in which system matrices are identified while controlling the real system with a progressively optimized policy. Specifically, we consider a time-varying control policy that, while applied to the real unknown system, is iteratively refined (based on the most updated estimate of the system matrices) towards the optimal LQR solution. The overall learning procedure combines a recursive least squares method with a direct policy search based on the gradient method. By resorting to Lyapunov-based analysis tools in combination with averaging theory for nonlinear systems, exponential stability for the closed-loop scheme can be proven. Finally, a numerical example showing the effectiveness of the considered strategy corroborates the theoretical findings.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.