This article is a report of our extensive experimentation, during the last two years,of deep reinforcement techniques for training an agent to move in the dungeons of the famous Rogue video-game. The challenging nature of the problem is tightly related the procedural, random generation of new dungeon maps at each level, that forbids any form of level-specific learning, and forces to address the navigation problem in its full generality. Other interesting aspects of the game from the point of view of automatic learning are the partially observable nature of the problem, since maps are initially not visible and get discovered during exploration, and the problem of sparse rewards, requiring the acquisition of complex, non-reactive behaviors involving memory and planning. In this article, we develop on previous works to make a more systematic comparison of different learning techniques, focusing in particular on Asynchronous Advantage Actor-Critic (A3C) and Actor-Critic with Experience Replay (ACER). In a game like Rogue, sparsity of rewards is mitigated by the variability of the dungeon configurations (sometimes, by luck, exit is at hand); if this variability can be tamed - as ACER, better than other algorithms, seems able to do - the problem of sparse rewards can be overcome without any need of intrinsic motivations.

Crawling in Rogue's Dungeons With Deep Reinforcement Techniques / Asperti, Andrea; Cortesi, Daniele; De Pieri, Carlo; Pedrini, Gianmaria; Sovrano, Francesco. - In: IEEE TRANSACTIONS ON GAMES. - ISSN 2475-1510. - STAMPA. - 12:2(2020), pp. 177-186. [10.1109/TG.2019.2899159]

Crawling in Rogue's Dungeons With Deep Reinforcement Techniques

Asperti, Andrea;Sovrano, Francesco
2020

Abstract

This article is a report of our extensive experimentation, during the last two years,of deep reinforcement techniques for training an agent to move in the dungeons of the famous Rogue video-game. The challenging nature of the problem is tightly related the procedural, random generation of new dungeon maps at each level, that forbids any form of level-specific learning, and forces to address the navigation problem in its full generality. Other interesting aspects of the game from the point of view of automatic learning are the partially observable nature of the problem, since maps are initially not visible and get discovered during exploration, and the problem of sparse rewards, requiring the acquisition of complex, non-reactive behaviors involving memory and planning. In this article, we develop on previous works to make a more systematic comparison of different learning techniques, focusing in particular on Asynchronous Advantage Actor-Critic (A3C) and Actor-Critic with Experience Replay (ACER). In a game like Rogue, sparsity of rewards is mitigated by the variability of the dungeon configurations (sometimes, by luck, exit is at hand); if this variability can be tamed - as ACER, better than other algorithms, seems able to do - the problem of sparse rewards can be overcome without any need of intrinsic motivations.
2020
Crawling in Rogue's Dungeons With Deep Reinforcement Techniques / Asperti, Andrea; Cortesi, Daniele; De Pieri, Carlo; Pedrini, Gianmaria; Sovrano, Francesco. - In: IEEE TRANSACTIONS ON GAMES. - ISSN 2475-1510. - STAMPA. - 12:2(2020), pp. 177-186. [10.1109/TG.2019.2899159]
Asperti, Andrea; Cortesi, Daniele; De Pieri, Carlo; Pedrini, Gianmaria; Sovrano, Francesco
File in questo prodotto:
File Dimensione Formato  
main.pdf

accesso riservato

Tipo: Versione (PDF) editoriale
Licenza: Licenza per accesso riservato
Dimensione 648.85 kB
Formato Adobe PDF
648.85 kB Adobe PDF   Visualizza/Apri   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/762642
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 7
  • ???jsp.display-item.citation.isi??? 4
social impact