Enabling Big Data Analytics in the Hybrid Cloud Using Iterative MapReduce

Clemente-Castelló, Francisco J.; Nicolae, Bogdan; Katrinis, Kostas; Rafique, M. Mustafa; Mayo, Rafael; Fernández, Juan Carlos; Loreti, Daniela

doi:10.1109/UCC.2015.47

The cloud computing model has seen tremendous commercial success through its materialization via two prominent models to date, namely public and private cloud. Recently, a third model combining the former two service models as on-/off-premise resources has been receiving significant market traction: hybrid cloud. While state of art techniques that address workload performance prediction and efficient workload execution over hybrid cloud setups exist, how to address data-intensive workloads-including Big Data Analytics-in similar environments is nascent. This paper addresses this gap by taking on the challenge of bursting over hybrid clouds for the benefit of accelerating iterative MapReduce applications. We first specify the challenges associated with data locality and data movement in such setups. Subsequently, we propose a novel technique to address the locality issue, without requiring changes to the MapReduce framework or the underlying storage layer. In addition, we contribute with a performance prediction methodology that combines modeling with micro-benchmarks to estimate completion time for iterative MapReduce applications, which enables users to estimate cost-to-solution before committing extra resources from public clouds. We show through experimentation in a dual-Openstack hybrid cloud setup that our solutions manage to bring substantial improvement at predictable cost-control for two real-life iterative MapReduce applications: large-scale machine learning and text analysis.

Clemente-Castelló, F.J., Nicolae, B., Katrinis, K., Rafique, M.M., Mayo, R., Fernández, J.C., et al. (2015). Enabling Big Data Analytics in the Hybrid Cloud Using Iterative MapReduce. Institute of Electrical and Electronics Engineers Inc. [10.1109/UCC.2015.47].

Enabling Big Data Analytics in the Hybrid Cloud Using Iterative MapReduce

Clemente-Castelló, Francisco J.;Nicolae, Bogdan;Katrinis, Kostas;Rafique, M. Mustafa;Mayo, Rafael;Fernández, Juan Carlos;Loreti, Daniela

2015

Abstract

The cloud computing model has seen tremendous commercial success through its materialization via two prominent models to date, namely public and private cloud. Recently, a third model combining the former two service models as on-/off-premise resources has been receiving significant market traction: hybrid cloud. While state of art techniques that address workload performance prediction and efficient workload execution over hybrid cloud setups exist, how to address data-intensive workloads-including Big Data Analytics-in similar environments is nascent. This paper addresses this gap by taking on the challenge of bursting over hybrid clouds for the benefit of accelerating iterative MapReduce applications. We first specify the challenges associated with data locality and data movement in such setups. Subsequently, we propose a novel technique to address the locality issue, without requiring changes to the MapReduce framework or the underlying storage layer. In addition, we contribute with a performance prediction methodology that combines modeling with micro-benchmarks to estimate completion time for iterative MapReduce applications, which enables users to estimate cost-to-solution before committing extra resources from public clouds. We show through experimentation in a dual-Openstack hybrid cloud setup that our solutions manage to bring substantial improvement at predictable cost-control for two real-life iterative MapReduce applications: large-scale machine learning and text analysis.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2015
			
	Titolo del volume
	
				Proceedings - 2015 IEEE/ACM 8th International Conference on Utility and Cloud Computing, UCC 2015
			
	Pagina iniziale
	
				290
			
	Pagina finale
	
				299
			
	Codice DOI
	
				https://dx.doi.org/10.1109/UCC.2015.47
			
	Citazione
	
				Clemente-Castelló, F.J., Nicolae, B., Katrinis, K., Rafique, M.M., Mayo, R., Fernández, J.C., et al. (2015). Enabling Big Data Analytics in the Hybrid Cloud Using Iterative MapReduce. Institute of Electrical and Electronics Engineers Inc. [10.1109/UCC.2015.47].
			
	Tutti gli autori
	
						Clemente-Castelló, Francisco J.; Nicolae, Bogdan; Katrinis, Kostas; Rafique, M. Mustafa; Mayo, Rafael; Fernández, Juan Carlos; Loreti, Daniela...espandi
						
	Appare nelle tipologie:
	
				4.01 Contributo in Atti di convegno

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/681450

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

21

15

CRIS Current Research Information System