Tightly-coupled hardware support to dynamic parallelism acceleration in embedded shared memory clusters