Many modern video processing pipelines rely on edge-aware (EA) filtering methods. However, recent high-quality methods are challenging to run in real-time on embedded hardware due to their computational load. To this end, we propose an area-efficient and real-time capable hardware implementation of a high quality EA method. In particular, we focus on the recently proposed permeability filter (PF) that delivers promising quality and performance in the domains of high dynamic range (HDR) tone mapping, disparity and optical flow estimation. We present an efficient hardware accelerator that implements a tiled variant of the PF with low on-chip memory requirements and a significantly reduced external memory bandwidth (6.4× w.r.t. the non-tiled PF). The design has been taped out in 65 nm CMOS technology, is able to filter 720p grayscale video at 24.8 Hz and achieves a high compute density of 6.7GFLOPS/mm2 (12× higher than embedded GPUs when scaled to the same technology node). The low area and bandwidth requirements make the accelerator highly suitable for integration into systems-on-chip (SoCs) where silicon area budget is constrained and external memory is typically a heavily contended resource.
Eggimann, M., Gloor, C., Scheidegger, F., Cavigelli, L., Schaffner, M., Smolic, A., et al. (2018). Hydra: An Accelerator for Real-Time Edge-Aware Permeability Filtering in 65nm CMOS. Institute of Electrical and Electronics Engineers Inc. [10.1109/ISCAS.2018.8351051].
Hydra: An Accelerator for Real-Time Edge-Aware Permeability Filtering in 65nm CMOS
Benini, L.
2018
Abstract
Many modern video processing pipelines rely on edge-aware (EA) filtering methods. However, recent high-quality methods are challenging to run in real-time on embedded hardware due to their computational load. To this end, we propose an area-efficient and real-time capable hardware implementation of a high quality EA method. In particular, we focus on the recently proposed permeability filter (PF) that delivers promising quality and performance in the domains of high dynamic range (HDR) tone mapping, disparity and optical flow estimation. We present an efficient hardware accelerator that implements a tiled variant of the PF with low on-chip memory requirements and a significantly reduced external memory bandwidth (6.4× w.r.t. the non-tiled PF). The design has been taped out in 65 nm CMOS technology, is able to filter 720p grayscale video at 24.8 Hz and achieves a high compute density of 6.7GFLOPS/mm2 (12× higher than embedded GPUs when scaled to the same technology node). The low area and bandwidth requirements make the accelerator highly suitable for integration into systems-on-chip (SoCs) where silicon area budget is constrained and external memory is typically a heavily contended resource.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.