This article is a report of our extensive experimentation, during the last two years,of deep reinforcement techniques for training an agent to move in the dungeons of the famous Rogue video-game. The challenging nature of the problem is tightly related the procedural, random generation of new dungeon maps at each level, that forbids any form of level-specific learning, and forces to address the navigation problem in its full generality. Other interesting aspects of the game from the point of view of automatic learning are the partially observable nature of the problem, since maps are initially not visible and get discovered during exploration, and the problem of sparse rewards, requiring the acquisition of complex, non-reactive behaviors involving memory and planning. In this article, we develop on previous works to make a more systematic comparison of different learning techniques, focusing in particular on Asynchronous Advantage Actor-Critic (A3C) and Actor-Critic with Experience Replay (ACER). In a game like Rogue, sparsity of rewards is mitigated by the variability of the dungeon configurations (sometimes, by luck, exit is at hand); if this variability can be tamed - as ACER, better than other algorithms, seems able to do - the problem of sparse rewards can be overcome without any need of intrinsic motivations.
Asperti, A., Cortesi, D., De Pieri, C., Pedrini, G., Sovrano, F. (2020). Crawling in Rogue's Dungeons With Deep Reinforcement Techniques. IEEE TRANSACTIONS ON GAMES, 12(2), 177-186 [10.1109/TG.2019.2899159].
Crawling in Rogue's Dungeons With Deep Reinforcement Techniques
Asperti, Andrea;Sovrano, Francesco
2020
Abstract
This article is a report of our extensive experimentation, during the last two years,of deep reinforcement techniques for training an agent to move in the dungeons of the famous Rogue video-game. The challenging nature of the problem is tightly related the procedural, random generation of new dungeon maps at each level, that forbids any form of level-specific learning, and forces to address the navigation problem in its full generality. Other interesting aspects of the game from the point of view of automatic learning are the partially observable nature of the problem, since maps are initially not visible and get discovered during exploration, and the problem of sparse rewards, requiring the acquisition of complex, non-reactive behaviors involving memory and planning. In this article, we develop on previous works to make a more systematic comparison of different learning techniques, focusing in particular on Asynchronous Advantage Actor-Critic (A3C) and Actor-Critic with Experience Replay (ACER). In a game like Rogue, sparsity of rewards is mitigated by the variability of the dungeon configurations (sometimes, by luck, exit is at hand); if this variability can be tamed - as ACER, better than other algorithms, seems able to do - the problem of sparse rewards can be overcome without any need of intrinsic motivations.File | Dimensione | Formato | |
---|---|---|---|
main.pdf
accesso riservato
Tipo:
Versione (PDF) editoriale
Licenza:
Licenza per accesso riservato
Dimensione
648.85 kB
Formato
Adobe PDF
|
648.85 kB | Adobe PDF | Visualizza/Apri Contatta l'autore |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.