A case for three-dimensional stacking of tightly coupled data memories over multi-core clusters using low-latency interconnects