CRIS Current Research Information System

MapReduce is with no doubt the parallel computation paradigm which has managed to interpret and serve at best the need, expressed in any field, of running fast and accurate analyses on Big Data. The strength of MapReduce is its capability of exploiting the computing power of a cluster of resources, by distributing the load on multiple computing units, and of scaling with the number of computing units. Today many data analysis algorithms are available in the MapReduce form: Data Sorting, Data Indexing, Word Counting, Relations Joining to name just a few. These algorithms have been observed to work fine in computing context where the computing units (nodes) connect by way of high performing network links (in the order of Gigabits per second). Unfortunately, when it comes to run MapReduce on nodes that are geographically distant to each other the performance dramatically degrades. Basically, in such scenarios the cost for moving data among nodes connected via geographic links counterbalances the benefit of parallelization. In this paper the issues of running MapReduce Joins in a geo-distributed computing context are discussed. Furthermore, we propose to boost the performance of the Join algorithm by leveraging a hierarchical computing approach.

Di Modica G., Tomarchio O. (2019). MapReduce Join Across Geo-Distributed Data Centers. Berlin : Springer [10.1007/978-3-030-27355-2_2].

MapReduce Join Across Geo-Distributed Data Centers

Di Modica G.;Tomarchio O.

2019

Abstract

MapReduce is with no doubt the parallel computation paradigm which has managed to interpret and serve at best the need, expressed in any field, of running fast and accurate analyses on Big Data. The strength of MapReduce is its capability of exploiting the computing power of a cluster of resources, by distributing the load on multiple computing units, and of scaling with the number of computing units. Today many data analysis algorithms are available in the MapReduce form: Data Sorting, Data Indexing, Word Counting, Relations Joining to name just a few. These algorithms have been observed to work fine in computing context where the computing units (nodes) connect by way of high performing network links (in the order of Gigabits per second). Unfortunately, when it comes to run MapReduce on nodes that are geographically distant to each other the performance dramatically degrades. Basically, in such scenarios the cost for moving data among nodes connected via geographic links counterbalances the benefit of parallelization. In this paper the issues of running MapReduce Joins in a geo-distributed computing context are discussed. Furthermore, we propose to boost the performance of the Join algorithm by leveraging a hierarchical computing approach.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2019
			
	Titolo del volume
	
				Communications in Computer and Information Science
			
	Pagina iniziale
	
				18
			
	Pagina finale
	
				31
			
	Collana/Serie
	
				COMMUNICATIONS IN COMPUTER AND INFORMATION SCIENCE
			
	Codice DOI
	
				https://dx.doi.org/10.1007/978-3-030-27355-2_2
			
	Citazione
	
				Di Modica G.,  Tomarchio O. (2019). MapReduce Join Across Geo-Distributed Data Centers. Berlin : Springer [10.1007/978-3-030-27355-2_2].
			
	Tutti gli autori
	
						Di Modica G.; Tomarchio O.
					
	Appare nelle tipologie:
	
				2.01 Capitolo / saggio in libro

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/736197

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

1

ND

social impact