In the approaching era of IoT, flexible and low power accelerators have become essential to meet aggressive energy efficiency targets. During the last few decades, Coarse Grain Reconfigurable Arrays (CGRA) have demonstrated high energy efficiency as accelerators, especially for high-performance streaming applications. While existing CGRAs mostly rely on partial and full predication techniques to support conditional branches, inefficient architecture and mapping support for handling control flow limits the use of CGRAs in accelerating either only inner loop bodies, or transformed loops specifically adapted to the target CGRA. This paper proposes a novel CGRA architecture with support for jump and conditional jump instructions and a lightweight global synchronization mechanism to enable complete Control Data Flow Graph (CDFG) mapping in an ultra-low-power environment. The architecture is coupled with a complete design flow that efficiently maps applications with heavy control flow starting from a generic C language description. The proposed mapping approach reduces the impact of wasteful instruction issues in the conventional approaches of predication providing an average energy improvement of 1.44x and 1.6x when compared to the state of the art partial and full predication techniques. Moreover, the proposed method achieves an average speed-up up to 21x and an energy improvement up to 50.42x while executing applications with heavy control flow with respect to sequential execution on a low-power embedded CPU, demonstrating its suitability for next generation IoT applications.
Das, S., Martin, K.J., Coussy, P., Rossi, D., Benini, L. (2017). Efficient mapping of CDFG onto coarse-grained reconfigurable array architectures. Institute of Electrical and Electronics Engineers Inc. [10.1109/ASPDAC.2017.7858308].
Efficient mapping of CDFG onto coarse-grained reconfigurable array architectures
DAS, SATYAJIT;ROSSI, DAVIDE;BENINI, LUCA
2017
Abstract
In the approaching era of IoT, flexible and low power accelerators have become essential to meet aggressive energy efficiency targets. During the last few decades, Coarse Grain Reconfigurable Arrays (CGRA) have demonstrated high energy efficiency as accelerators, especially for high-performance streaming applications. While existing CGRAs mostly rely on partial and full predication techniques to support conditional branches, inefficient architecture and mapping support for handling control flow limits the use of CGRAs in accelerating either only inner loop bodies, or transformed loops specifically adapted to the target CGRA. This paper proposes a novel CGRA architecture with support for jump and conditional jump instructions and a lightweight global synchronization mechanism to enable complete Control Data Flow Graph (CDFG) mapping in an ultra-low-power environment. The architecture is coupled with a complete design flow that efficiently maps applications with heavy control flow starting from a generic C language description. The proposed mapping approach reduces the impact of wasteful instruction issues in the conventional approaches of predication providing an average energy improvement of 1.44x and 1.6x when compared to the state of the art partial and full predication techniques. Moreover, the proposed method achieves an average speed-up up to 21x and an energy improvement up to 50.42x while executing applications with heavy control flow with respect to sequential execution on a low-power embedded CPU, demonstrating its suitability for next generation IoT applications.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.