On the relevance of architectural awareness for efficient fork/join support on cluster-based manycores