MapReduce programming model and its implementations, such as the widely diffused Apache Hadoop framework, are spreading more and more due to their inherent capacity of enabling scalable processing of large-scale datasets. The advent of Cloud has further boosted this trend through the provisioning of virtual Hadoop clusters, easily configurable and accessible according to the Platform as a Service (PaaS) model, deployed over existing Cloud Infrastructure as a Service (IaaS) platforms. However, the coexistence of multiple virtual Hadoop clusters competing for the same shared physical resources requires new management solutions able to dynamically reconfigure and rebalance the placement of Hadoop service components over the virtualized IaaS platform. This paper proposes ESAMAR (Elastic Sahara MApReduce), a novel support based on a cross-layer PaaS-IaaS management approach to transparently grant elasticity and efficiency at the Hadoop PaaS level. ESAMAR monitors the performance of Hadoop clusters at both IaaS and physical layers and exploits load balancing techniques, with full awareness of virtual Hadoop clusters and resources at PaaS/IaaS levels. We deeply assessed our framework in a realistic scenario based on the open source OpenStack; collected results demonstrate the effectiveness and the suitability of our management techniques that contribute to reduce Hadoop job completion time, even under challenging heavy-loaded Cloud system conditions.
Elastic provisioning of virtual Hadoop clusters in OpenStack-based Clouds
CORRADI, ANTONIO;FOSCHINI, LUCA;PERNAFINI, ALESSANDRO
2015
Abstract
MapReduce programming model and its implementations, such as the widely diffused Apache Hadoop framework, are spreading more and more due to their inherent capacity of enabling scalable processing of large-scale datasets. The advent of Cloud has further boosted this trend through the provisioning of virtual Hadoop clusters, easily configurable and accessible according to the Platform as a Service (PaaS) model, deployed over existing Cloud Infrastructure as a Service (IaaS) platforms. However, the coexistence of multiple virtual Hadoop clusters competing for the same shared physical resources requires new management solutions able to dynamically reconfigure and rebalance the placement of Hadoop service components over the virtualized IaaS platform. This paper proposes ESAMAR (Elastic Sahara MApReduce), a novel support based on a cross-layer PaaS-IaaS management approach to transparently grant elasticity and efficiency at the Hadoop PaaS level. ESAMAR monitors the performance of Hadoop clusters at both IaaS and physical layers and exploits load balancing techniques, with full awareness of virtual Hadoop clusters and resources at PaaS/IaaS levels. We deeply assessed our framework in a realistic scenario based on the open source OpenStack; collected results demonstrate the effectiveness and the suitability of our management techniques that contribute to reduce Hadoop job completion time, even under challenging heavy-loaded Cloud system conditions.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.