Modern IoT end nodes must support computational intensive workloads at a limited power-budget. Parallel ultra-low-power architectures are a promising target for this scenario, and the availability of highly optimized software libraries is crucial to exploit parallelism and reduce software development costs. This letter proposes an efficient parallel design of the widely used STFT and DWT transforms targeting ultra-low-power IoT devices. We address key performance challenges related to fine-grained synchronization and banking conflicts in shared memory. We achieve high throughput (50.95 samples/μs, on average), good parallel speedup (up to 6.79×), and high energy efficiency (up to 172.55 GOp/s/W) on a cluster of 8 RISC-V cores optimized for parallel ultra-low-power (PULP) operation.
Efficient Transform Algorithms for Parallel Ultra-Low-Power IoT End Nodes
Mazzoni B.;Benatti S.;Benini L.;Tagliavini G.
2021
Abstract
Modern IoT end nodes must support computational intensive workloads at a limited power-budget. Parallel ultra-low-power architectures are a promising target for this scenario, and the availability of highly optimized software libraries is crucial to exploit parallelism and reduce software development costs. This letter proposes an efficient parallel design of the widely used STFT and DWT transforms targeting ultra-low-power IoT devices. We address key performance challenges related to fine-grained synchronization and banking conflicts in shared memory. We achieve high throughput (50.95 samples/μs, on average), good parallel speedup (up to 6.79×), and high energy efficiency (up to 172.55 GOp/s/W) on a cluster of 8 RISC-V cores optimized for parallel ultra-low-power (PULP) operation.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.