Modern MPSoC architectures incorporate tens of processing elements on a single die. This trend poses the need of expressing the parallelism of the applications in order to effectively exploit the available resources. Several models of computation have been proposed, that specify an application as a network of independent computational elements. Such models represent a suitable solution for systematic mapping of parallel applications onto multiprocessor architectures. However, the workload of a given application can abruptly vary, as well as the amount of computing resources available, depending on the overall workload of the system and on the input data dependency. Traditional worst-case designs may overestimate workloads, leading to resource wasting and unnecessary power consumption. To overcome such limitation, in this work we devise a fast, run-time and automatic approach able to quickly re-configure the core-to-task mapping and the degree of parallelism of the application when the available resources or the application workload change, targeting shared-memory platforms. Experiments, carried out using an FPGA implementation, demonstrate the effectiveness of the proposed approach, in terms of achievable speed-up, power saving and introduced overhead.
Tuveri, G., Meloni, P., Palumbo, F., Pietro Seu, G., Loi, I., Conti, F., et al. (2016). On-the-fly adaptivity for process networks over shared-memory platforms. MICROPROCESSORS AND MICROSYSTEMS, 46, 240-254 [10.1016/j.micpro.2016.06.010].
On-the-fly adaptivity for process networks over shared-memory platforms
PALUMBO, FRANCESCA;LOI, IGOR;CONTI, FRANCESCO;
2016
Abstract
Modern MPSoC architectures incorporate tens of processing elements on a single die. This trend poses the need of expressing the parallelism of the applications in order to effectively exploit the available resources. Several models of computation have been proposed, that specify an application as a network of independent computational elements. Such models represent a suitable solution for systematic mapping of parallel applications onto multiprocessor architectures. However, the workload of a given application can abruptly vary, as well as the amount of computing resources available, depending on the overall workload of the system and on the input data dependency. Traditional worst-case designs may overestimate workloads, leading to resource wasting and unnecessary power consumption. To overcome such limitation, in this work we devise a fast, run-time and automatic approach able to quickly re-configure the core-to-task mapping and the degree of parallelism of the application when the available resources or the application workload change, targeting shared-memory platforms. Experiments, carried out using an FPGA implementation, demonstrate the effectiveness of the proposed approach, in terms of achievable speed-up, power saving and introduced overhead.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.