Focusing on embedded applications, scratchpad memories (SPMs) look like a best-compromise solution when taking into account performance, energy consumption and die area. The main challenge in SPM design is to optimally map memory locations to scratchpad locations. This paper describes an algorithm to solve such a mapping problem by means of Dynamic Programming applied to a synthesizable hardware architecture. The algorithm works by mapping segments of external memory to physically partitioned banks of an on-chip SPM; this architecture provides significant energy savings. The algorithm does not require any user-set bound on the number of partitions and takes into account partitioning overhead. Improving on previous solutions, execution time is polynomial in the number of memory locations, even in the most general solving policy. This has the major practical advantage of allowing an arbitrary number of scratchpad segments, something that was impossible with previous methods, whose running time is exponential in this number. Strategies to optimize memory requirements and speed of the algorithm are exploited. Additionally, we integrate this algorithm in a complete and automated design, simulation and synthesis flow.
F. Angiolini, L. Benini, A. Caprara (2005). An Efficient Profile-Based Algorithm for Scratchpad Memory Partitioning. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 24, 1644-1658 [10.1109/TCAD.2005.852299].
An Efficient Profile-Based Algorithm for Scratchpad Memory Partitioning
ANGIOLINI, FEDERICO;BENINI, LUCA;CAPRARA, ALBERTO
2005
Abstract
Focusing on embedded applications, scratchpad memories (SPMs) look like a best-compromise solution when taking into account performance, energy consumption and die area. The main challenge in SPM design is to optimally map memory locations to scratchpad locations. This paper describes an algorithm to solve such a mapping problem by means of Dynamic Programming applied to a synthesizable hardware architecture. The algorithm works by mapping segments of external memory to physically partitioned banks of an on-chip SPM; this architecture provides significant energy savings. The algorithm does not require any user-set bound on the number of partitions and takes into account partitioning overhead. Improving on previous solutions, execution time is polynomial in the number of memory locations, even in the most general solving policy. This has the major practical advantage of allowing an arbitrary number of scratchpad segments, something that was impossible with previous methods, whose running time is exponential in this number. Strategies to optimize memory requirements and speed of the algorithm are exploited. Additionally, we integrate this algorithm in a complete and automated design, simulation and synthesis flow.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.