GAMESH: A grid architecture for scalable monitoring and enhanced dependable job scheduling

Bellavista, Paolo; Cinque, Marcello; Corradi, Antonio; Foschini, Luca; Frattini, Flavio; Povedano Molina, Javier

doi:10.1016/j.future.2016.10.023

Grid computing is a largely adopted paradigm to federate geographically distributed data centers. Due to their size and complexity, grid systems are often affected by failures that may hinder the correct and timely execution of jobs, thus causing a non-negligible waste of computing resources. Despite the relevance of the problem, state-of-the-art management solutions for grid systems usually neglect the identification and handling of failures at runtime. Among the primary goals to be considered, we claim the need for novel approaches capable to achieve the objectives of scalable integration with efficient monitoring solutions and of fitting large and geographically distributed systems, where dynamic and configurable tradeoffs between overhead and targeted granularity are necessary. This paper proposes GAMESH, a Grid Architecture for scalable Monitoring and Enhanced dependable job ScHeduling. GAMESH is conceived as a completely distributed and highly efficient management infrastructure, concentrating on two crucial aspects for large-scale and multi-domain grid environments: (i) the scalable dissemination of monitoring data and (ii) the troubleshooting of job execution failures. GAMESH has been implemented and tested in a real deployment encompassing geographically distributed data centers across Europe. Experimental results show that GAMESH (i) enables the collection of measurements of both computing resources and conditions of task scheduling at geographically sparse sites, while imposing a limited overhead on the entire infrastructure, and (ii) provides a failure-aware scheduler able to improve the overall system performance, even in the presence of failures, by coordinating local job schedulers at multiple domains.

Bellavista, P., Cinque, M., Corradi, A., Foschini, L., Frattini, F., Povedano-Molina, J. (2017). GAMESH: A grid architecture for scalable monitoring and enhanced dependable job scheduling. FUTURE GENERATION COMPUTER SYSTEMS, 71, 192-201 [10.1016/j.future.2016.10.023].

GAMESH: A grid architecture for scalable monitoring and enhanced dependable job scheduling

BELLAVISTA, PAOLO;Cinque, Marcello;CORRADI, ANTONIO;FOSCHINI, LUCA;Frattini, Flavio;Povedano Molina, Javier

2017

Abstract

Grid computing is a largely adopted paradigm to federate geographically distributed data centers. Due to their size and complexity, grid systems are often affected by failures that may hinder the correct and timely execution of jobs, thus causing a non-negligible waste of computing resources. Despite the relevance of the problem, state-of-the-art management solutions for grid systems usually neglect the identification and handling of failures at runtime. Among the primary goals to be considered, we claim the need for novel approaches capable to achieve the objectives of scalable integration with efficient monitoring solutions and of fitting large and geographically distributed systems, where dynamic and configurable tradeoffs between overhead and targeted granularity are necessary. This paper proposes GAMESH, a Grid Architecture for scalable Monitoring and Enhanced dependable job ScHeduling. GAMESH is conceived as a completely distributed and highly efficient management infrastructure, concentrating on two crucial aspects for large-scale and multi-domain grid environments: (i) the scalable dissemination of monitoring data and (ii) the troubleshooting of job execution failures. GAMESH has been implemented and tested in a real deployment encompassing geographically distributed data centers across Europe. Experimental results show that GAMESH (i) enables the collection of measurements of both computing resources and conditions of task scheduling at geographically sparse sites, while imposing a limited overhead on the entire infrastructure, and (ii) provides a failure-aware scheduler able to improve the overall system performance, even in the presence of failures, by coordinating local job schedulers at multiple domains.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2017
			
	Rivista
	
				FUTURE GENERATION COMPUTER SYSTEMS
			
	Codice DOI
	
				https://dx.doi.org/10.1016/j.future.2016.10.023
			
	Citazione
	
				Bellavista, P., Cinque, M., Corradi, A., Foschini, L., Frattini, F., Povedano-Molina, J. (2017). GAMESH: A grid architecture for scalable monitoring and enhanced dependable job scheduling. FUTURE GENERATION COMPUTER SYSTEMS, 71, 192-201 [10.1016/j.future.2016.10.023].
			
	Tutti gli autori
	
						Bellavista, Paolo; Cinque, Marcello; Corradi, Antonio; Foschini, Luca; Frattini, Flavio; Povedano-Molina, Javier
					
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/586081

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

12

7

ND

CRIS Current Research Information System