In this paper, we present StreamDrive, a dynamic data flow framework for programming clustered embedded multicore architectures. StreamDrive simplifies development of dynamic data flow applications starting from sequential reference C code and allows seamless handling of heterogeneous and application-specific processing elements at the application level. We address issues of efficient implementation of the dynamic data flow runtime system in the context of constrained embedded environments, which have not been sufficiently addressed by previous research. We conducted a detailed performance evaluation of the StreamDrive implementation on our Application Specific MultiProcessor (ASMP) cluster using the Oriented FAST and Rotated BRIEF (ORB) algorithm typical of image processing domain. Our implementation has less than 10% parallelization overhead, near linear speed-up when the number of processors increases from 1 to 8, and achieves the performance of 15 VGA frames per second with a small cluster configuration of 4 processing elements and 64KB of shared memory, and of 30 VGA frames per second with 8 processors and 128KB of shared memory.
Stoutchinin, A., Benini, L. (2017). StreamDrive: A dynamic dataflow framework for clustered embedded architectures. Association for Computing Machinery, Inc [10.1145/3075564.3075568].
StreamDrive: A dynamic dataflow framework for clustered embedded architectures
STOUTCHININ, ARTHUR;Benini, Luca
2017
Abstract
In this paper, we present StreamDrive, a dynamic data flow framework for programming clustered embedded multicore architectures. StreamDrive simplifies development of dynamic data flow applications starting from sequential reference C code and allows seamless handling of heterogeneous and application-specific processing elements at the application level. We address issues of efficient implementation of the dynamic data flow runtime system in the context of constrained embedded environments, which have not been sufficiently addressed by previous research. We conducted a detailed performance evaluation of the StreamDrive implementation on our Application Specific MultiProcessor (ASMP) cluster using the Oriented FAST and Rotated BRIEF (ORB) algorithm typical of image processing domain. Our implementation has less than 10% parallelization overhead, near linear speed-up when the number of processors increases from 1 to 8, and achieves the performance of 15 VGA frames per second with a small cluster configuration of 4 processing elements and 64KB of shared memory, and of 30 VGA frames per second with 8 processors and 128KB of shared memory.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.