In this paper, we consider a Linear Quadratic optimal control problem with the assumptions that the system dynamics is unknown and that the designed feedback control has to comply with a desired sparsity pattern. An important application where this set-up arises is distributed control of network systems, where the aim is to find an optimal sparse controller matching the communication graph. To tackle the problem, we propose a Reinforcement Learning framework based on a Q-learning scheme preserving a desired policy structure. At each time step the performance of the current candidate feedback is first evaluated through the computation of its Q-function, and then a new sparse feedback matrix, improving on the previous one, is computed. We prove that the scheme produces at each iteration a stabilizing feedback control with the desired sparsity and with non-increasing cost, which in turns indicates that every limit point of the computed feedback matrices is sparse and stabilizing. The algorithm is numerically tested on a distributed control scenario with randomly generated graph and unstable dynamics.
Sforni L., Camisa A., Notarstefano G. (2022). Structured-policy Q-learning: an LMI-based Design Strategy for Distributed Reinforcement Learning. 345 E 47TH ST, NEW YORK, NY 10017 USA : Institute of Electrical and Electronics Engineers Inc. [10.1109/CDC51059.2022.9992584].
Structured-policy Q-learning: an LMI-based Design Strategy for Distributed Reinforcement Learning
Sforni L.;Camisa A.;Notarstefano G.
2022
Abstract
In this paper, we consider a Linear Quadratic optimal control problem with the assumptions that the system dynamics is unknown and that the designed feedback control has to comply with a desired sparsity pattern. An important application where this set-up arises is distributed control of network systems, where the aim is to find an optimal sparse controller matching the communication graph. To tackle the problem, we propose a Reinforcement Learning framework based on a Q-learning scheme preserving a desired policy structure. At each time step the performance of the current candidate feedback is first evaluated through the computation of its Q-function, and then a new sparse feedback matrix, improving on the previous one, is computed. We prove that the scheme produces at each iteration a stabilizing feedback control with the desired sparsity and with non-increasing cost, which in turns indicates that every limit point of the computed feedback matrices is sparse and stabilizing. The algorithm is numerically tested on a distributed control scenario with randomly generated graph and unstable dynamics.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.