In this paper, we propose a data-driven strategy to iteratively find the state feedback gain matrix solving a Linear Quadratic Regulator (LQR) problem in a model-free fashion, i.e., under unknown system and cost matrices. In our setup, we assume that, at each iteration, an oracle provides the LQR cost of the tentative policy, e.g., by running the system or a simulator. Based on this information, we develop an algorithm based on Extremum-Seeking to iteratively refine our tentative solution without any additional knowledge on the system and cost models. By using a Lyapunov-based approach exploiting averaging theory for time-varying systems, we show that the proposed algorithm exponentially converges to an arbitrarily small ball containing the optimal gain matrix. We corroborate the theoretical results by testing the proposed strategy via numerical simulations.
Carnevale, G., Mimmo, N., Notarstefano, G. (2024). Extremum-Seeking Policy Iteration for Data-Driven LQR. Institute of Electrical and Electronics Engineers Inc. [10.1109/cdc56724.2024.10885851].
Extremum-Seeking Policy Iteration for Data-Driven LQR
Carnevale, Guido
;Mimmo, Nicola;Notarstefano, Giuseppe
2024
Abstract
In this paper, we propose a data-driven strategy to iteratively find the state feedback gain matrix solving a Linear Quadratic Regulator (LQR) problem in a model-free fashion, i.e., under unknown system and cost matrices. In our setup, we assume that, at each iteration, an oracle provides the LQR cost of the tentative policy, e.g., by running the system or a simulator. Based on this information, we develop an algorithm based on Extremum-Seeking to iteratively refine our tentative solution without any additional knowledge on the system and cost models. By using a Lyapunov-based approach exploiting averaging theory for time-varying systems, we show that the proposed algorithm exponentially converges to an arbitrarily small ball containing the optimal gain matrix. We corroborate the theoretical results by testing the proposed strategy via numerical simulations.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.