Always-on image processing is crucial for many applications, such as face and attention detection, and it is usually offloaded to dedicated, energy-efficient image processors. These processors need to be flexible and scalable to follow the rapid evolution of image sensors and always-on image processing workloads. A flexible architecture is the shared memory cluster, where multiple cores are tightly coupled with L1 memory. However, current clusters are not latency tolerant and follow a uniform memory access approach, which limits their frequency and scalability. The MemPool architecture [1] lifts those constraints by combining latency-tolerant cores, pipelined functional processing units, and a non-uniform memory access interconnect. This paper presents MinPool, a low-power image processor for always-on functions implemented in TSMC’s 65 nm technology and based on a tailored MemPool architecture. Thanks to an instruction set architecture extension tuned for image processing and the low-leakage process, it achieves excellent utilization results with IPCs of up to 0.98 and an energy efficiency of 65 GOPS/W for key image processing kernels.
Riedel, S., Cavalcante, M., Frouzakis, M., Wüthrich, D., Mustafa, E., Billa, A., et al. (2023). MinPool: A 16-core NUMA-L1 Memory RISC-V Processor Cluster for Always-on Image Processing in 65nm CMOS [10.1109/ICECS58634.2023.10382925].
MinPool: A 16-core NUMA-L1 Memory RISC-V Processor Cluster for Always-on Image Processing in 65nm CMOS
Benini, Luca
2023
Abstract
Always-on image processing is crucial for many applications, such as face and attention detection, and it is usually offloaded to dedicated, energy-efficient image processors. These processors need to be flexible and scalable to follow the rapid evolution of image sensors and always-on image processing workloads. A flexible architecture is the shared memory cluster, where multiple cores are tightly coupled with L1 memory. However, current clusters are not latency tolerant and follow a uniform memory access approach, which limits their frequency and scalability. The MemPool architecture [1] lifts those constraints by combining latency-tolerant cores, pipelined functional processing units, and a non-uniform memory access interconnect. This paper presents MinPool, a low-power image processor for always-on functions implemented in TSMC’s 65 nm technology and based on a tailored MemPool architecture. Thanks to an instruction set architecture extension tuned for image processing and the low-leakage process, it achieves excellent utilization results with IPCs of up to 0.98 and an energy efficiency of 65 GOPS/W for key image processing kernels.| File | Dimensione | Formato | |
|---|---|---|---|
|
CameraReady_IEEE.pdf
accesso aperto
Tipo:
Postprint / Author's Accepted Manuscript (AAM) - versione accettata per la pubblicazione dopo la peer-review
Licenza:
Licenza per accesso libero gratuito
Dimensione
955.7 kB
Formato
Adobe PDF
|
955.7 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


