CRIS Current Research Information System

Federated Learning (FL) is a collaborative training paradigm whereby a global Machine Learning (ML) model is trained using typically private and distributed data sources without disclosing the raw data. The approach paves the way for better privacy guarantees, improved overall system scalability, and sustainability. In this context, Federated Averaging (FedAvg) is a representative FL algorithm adopting a client–server protocol that operates in synchronous rounds, where selected learners contribute to the global model via local model updates, trained using their private data, while a server entity aggregates the local contributions, producing the new-generation global model as a weighted average of the local ones. However, when clients possess (highly) dissimilar data, the FedAvg technique becomes ineffective due to divergence in client models. Consequently, FedAvg-trained models struggle to generalize when presented with unseen data from the global distribution. In this research paper, we conduct a systematic review of state-of-the-art approaches proposed to counteract global model performance degradation in the presence of heterogeneous data. To this end, we compile an original taxonomy, highlighting the main algorithmic approaches and mechanisms behind each identified category. Advancing the current body of knowledge, we empirically evaluate the generalization performance on visual tasks of various methods under moderate and significant levels of data heterogeneity, as common practice within the surveyed literature. In addition, the paper benchmarks the performance of hybrid techniques, resulting as a combination of client- and server-side algorithmic tweaks, by shedding light on some associated performance tradeoffs. While recognizing other relevant issues in FL, such as device heterogeneity and energy consumption, which have a non-negligible impact on the learning process, these well-investigated topics are not the main focus of this article.

Mora, A., Bujari, A., Bellavista, P. (2024). Enhancing generalization in Federated Learning with heterogeneous data: A comparative literature review. FUTURE GENERATION COMPUTER SYSTEMS, 157, 1-15 [10.1016/j.future.2024.03.027].

Enhancing generalization in Federated Learning with heterogeneous data: A comparative literature review

Mora, Alessio^Primo;Bujari, Armir^Secondo;Bellavista, Paolo^Ultimo

2024

Abstract

Federated Learning (FL) is a collaborative training paradigm whereby a global Machine Learning (ML) model is trained using typically private and distributed data sources without disclosing the raw data. The approach paves the way for better privacy guarantees, improved overall system scalability, and sustainability. In this context, Federated Averaging (FedAvg) is a representative FL algorithm adopting a client–server protocol that operates in synchronous rounds, where selected learners contribute to the global model via local model updates, trained using their private data, while a server entity aggregates the local contributions, producing the new-generation global model as a weighted average of the local ones. However, when clients possess (highly) dissimilar data, the FedAvg technique becomes ineffective due to divergence in client models. Consequently, FedAvg-trained models struggle to generalize when presented with unseen data from the global distribution. In this research paper, we conduct a systematic review of state-of-the-art approaches proposed to counteract global model performance degradation in the presence of heterogeneous data. To this end, we compile an original taxonomy, highlighting the main algorithmic approaches and mechanisms behind each identified category. Advancing the current body of knowledge, we empirically evaluate the generalization performance on visual tasks of various methods under moderate and significant levels of data heterogeneity, as common practice within the surveyed literature. In addition, the paper benchmarks the performance of hybrid techniques, resulting as a combination of client- and server-side algorithmic tweaks, by shedding light on some associated performance tradeoffs. While recognizing other relevant issues in FL, such as device heterogeneity and energy consumption, which have a non-negligible impact on the learning process, these well-investigated topics are not the main focus of this article.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2024
			
	Rivista
	
				FUTURE GENERATION COMPUTER SYSTEMS
			
	Codice DOI
	
				https://dx.doi.org/10.1016/j.future.2024.03.027
			
	Citazione
	
				Mora, A., Bujari, A., Bellavista, P. (2024). Enhancing generalization in Federated Learning with heterogeneous data: A comparative literature review. FUTURE GENERATION COMPUTER SYSTEMS, 157, 1-15 [10.1016/j.future.2024.03.027].
			
	Tutti gli autori
	
						Mora, Alessio; Bujari, Armir; Bellavista, Paolo
					
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
FGCS___Copy.pdf Open Access dal 18/03/2025 Tipo: Postprint / Author's Accepted Manuscript (AAM) - versione accettata per la pubblicazione dopo la peer-review Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione - Non commerciale - Non opere derivate (CCBYNCND) Dimensione 2.87 MB Formato Adobe PDF Visualizza/Apri	2.87 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/969398

Citazioni

ND

44

29

social impact