Exploring DMA-assisted prefetching strategies for software caches on multicore clusters