Scalable Monitoring and dependable job ScHeduling support for multi-domain Grid infrastructures

Cinque, Marcello; Corradi, Antonio; Foschini, Luca; Frattini, Flavio; Povedano Molina, Javier

doi:10.1145/2851613.2851762

The management of Grid systems commonly lacks information for identifying the failures that may hinder the timely completion of jobs, and cause the wasting of computing resources. Monitoring can certainly help, but novel approaches need to be conceived for such large and geographically distributed systems. We propose a Grid Architecture for scalable Monitoring and Enhanced dependable job ScHeduling (GAMESH). GAMESH is a completely distributed and highly efficient management infrastructure for the dissemination of monitoring data and troubleshooting of job execution failures in large-scale and multi-domain Grid environments. Challenged in a real deployment and compared to other Grid management systems, GAMESH demonstrates to (i) ensure measurements of both computing resources and conditions of task scheduling at geographically sparse sites, while inducing a low overhead on the entire infrastructure, and (ii) enable failure-aware scheduling and improve overall system performance, even in the presence of failures, by coordinating local job schedulers at multiple domains.

Cinque, M., Corradi, A., Foschini, L., Frattini, F., Povedano-Molina, J. (2016). Scalable Monitoring and dependable job ScHeduling support for multi-domain Grid infrastructures. Association for Computing Machinery [10.1145/2851613.2851762].

Scalable Monitoring and dependable job ScHeduling support for multi-domain Grid infrastructures

Cinque, Marcello;CORRADI, ANTONIO;FOSCHINI, LUCA;Frattini, Flavio;Povedano Molina, Javier

2016

Abstract

The management of Grid systems commonly lacks information for identifying the failures that may hinder the timely completion of jobs, and cause the wasting of computing resources. Monitoring can certainly help, but novel approaches need to be conceived for such large and geographically distributed systems. We propose a Grid Architecture for scalable Monitoring and Enhanced dependable job ScHeduling (GAMESH). GAMESH is a completely distributed and highly efficient management infrastructure for the dissemination of monitoring data and troubleshooting of job execution failures in large-scale and multi-domain Grid environments. Challenged in a real deployment and compared to other Grid management systems, GAMESH demonstrates to (i) ensure measurements of both computing resources and conditions of task scheduling at geographically sparse sites, while inducing a low overhead on the entire infrastructure, and (ii) enable failure-aware scheduling and improve overall system performance, even in the presence of failures, by coordinating local job schedulers at multiple domains.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2016
			
	Titolo del volume
	
				Proceedings of the ACM Symposium on Applied Computing
			
	Pagina iniziale
	
				2015
			
	Pagina finale
	
				2020
			
	Codice DOI
	
				https://dx.doi.org/10.1145/2851613.2851762
			
	Citazione
	
				Cinque, M., Corradi, A., Foschini, L., Frattini, F., Povedano-Molina, J. (2016). Scalable Monitoring and dependable job ScHeduling support for multi-domain Grid infrastructures. Association for Computing Machinery [10.1145/2851613.2851762].
			
	Tutti gli autori
	
						Cinque, Marcello; Corradi, Antonio; Foschini, Luca; Frattini, Flavio; Povedano-Molina, Javier
					
	Appare nelle tipologie:
	
				4.01 Contributo in Atti di convegno

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/599537

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

2

ND

CRIS Current Research Information System