In this paper, we provide preliminary results toward the direction of using systems theory tools for the design and analysis of reinforcement learning algorithms. Specifically, we analyze the convergence properties of a model-based scheme with an actor-critic structure. The distinctive feature of our scheme is that the actor and critic updates are equipped with auxiliary variables that allow for the use of a constant step size. Although idealized due to the assumption on access to the underlying Markov Decision Process (MDP), the investigated setting is a starting point toward a genuine (model-free) actorcritic scheme. A key contribution is the interpretation of this algorithmic framework in terms of discrete-time, interconnected dynamical systems. Specifically, by resorting to Singular Perturbations (S P), we reinterpret the whole algorithm as the interconnection of a fast subsystem (auxiliary variables' mechanism), an intermediate one (critic), and a slow one (actor). We separately analyze three auxiliary systems each corresponding to one of the identified subsystems. These preparatory results, combined with SP and LaSalle arguments, allow us to prove that the overall method asymptotically converges to a problem stationary point. Some numerical simulations confirm our theoretical findings.

Baroncini, S., Carnevale, G., Gharesifard, B., Notarstefano, G. (2024). A System-theoretic Note on Model-Based Actor-Critic. Institute of Electrical and Electronics Engineers Inc. [10.1109/CDC56724.2024.10886557].

A System-theoretic Note on Model-Based Actor-Critic

Baroncini S.;Carnevale G.;Gharesifard B.;Notarstefano G.
2024

Abstract

In this paper, we provide preliminary results toward the direction of using systems theory tools for the design and analysis of reinforcement learning algorithms. Specifically, we analyze the convergence properties of a model-based scheme with an actor-critic structure. The distinctive feature of our scheme is that the actor and critic updates are equipped with auxiliary variables that allow for the use of a constant step size. Although idealized due to the assumption on access to the underlying Markov Decision Process (MDP), the investigated setting is a starting point toward a genuine (model-free) actorcritic scheme. A key contribution is the interpretation of this algorithmic framework in terms of discrete-time, interconnected dynamical systems. Specifically, by resorting to Singular Perturbations (S P), we reinterpret the whole algorithm as the interconnection of a fast subsystem (auxiliary variables' mechanism), an intermediate one (critic), and a slow one (actor). We separately analyze three auxiliary systems each corresponding to one of the identified subsystems. These preparatory results, combined with SP and LaSalle arguments, allow us to prove that the overall method asymptotically converges to a problem stationary point. Some numerical simulations confirm our theoretical findings.
2024
Proceedings of the IEEE Conference on Decision and Control
1929
1934
Baroncini, S., Carnevale, G., Gharesifard, B., Notarstefano, G. (2024). A System-theoretic Note on Model-Based Actor-Critic. Institute of Electrical and Electronics Engineers Inc. [10.1109/CDC56724.2024.10886557].
Baroncini, S.; Carnevale, G.; Gharesifard, B.; Notarstefano, G.
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1013604
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact