Curbing the roofline: a scalable and flexible architecture for CNNs on FPGA