Federated Learning (FL) is a collaborative training paradigm whereby a global Machine Learning (ML) model is trained using typically private and distributed data sources without disclosing the raw data. The approach paves the way for better privacy guarantees, improved overall system scalability, and sustainability. In this context, Federated Averaging (FedAvg) is a representative FL algorithm adopting a client–server protocol that operates in synchronous rounds, where selected learners contribute to the global model via local model updates, trained using their private data, while a server entity aggregates the local contributions, producing the new-generation global model as a weighted average of the local ones. However, when clients possess (highly) dissimilar data, the FedAvg technique becomes ineffective due to divergence in client models. Consequently, FedAvg-trained models struggle to generalize when presented with unseen data from the global distribution. In this research paper, we conduct a systematic review of state-of-the-art approaches proposed to counteract global model performance degradation in the presence of heterogeneous data. To this end, we compile an original taxonomy, highlighting the main algorithmic approaches and mechanisms behind each identified category. Advancing the current body of knowledge, we empirically evaluate the generalization performance on visual tasks of various methods under moderate and significant levels of data heterogeneity, as common practice within the surveyed literature. In addition, the paper benchmarks the performance of hybrid techniques, resulting as a combination of client- and server-side algorithmic tweaks, by shedding light on some associated performance tradeoffs. While recognizing other relevant issues in FL, such as device heterogeneity and energy consumption, which have a non-negligible impact on the learning process, these well-investigated topics are not the main focus of this article.

Mora, A., Bujari, A., Bellavista, P. (2024). Enhancing generalization in Federated Learning with heterogeneous data: A comparative literature review. FUTURE GENERATION COMPUTER SYSTEMS, 157, 1-15 [10.1016/j.future.2024.03.027].

Enhancing generalization in Federated Learning with heterogeneous data: A comparative literature review

Mora, Alessio
Primo
;
Bujari, Armir
Secondo
;
Bellavista, Paolo
Ultimo
2024

Abstract

Federated Learning (FL) is a collaborative training paradigm whereby a global Machine Learning (ML) model is trained using typically private and distributed data sources without disclosing the raw data. The approach paves the way for better privacy guarantees, improved overall system scalability, and sustainability. In this context, Federated Averaging (FedAvg) is a representative FL algorithm adopting a client–server protocol that operates in synchronous rounds, where selected learners contribute to the global model via local model updates, trained using their private data, while a server entity aggregates the local contributions, producing the new-generation global model as a weighted average of the local ones. However, when clients possess (highly) dissimilar data, the FedAvg technique becomes ineffective due to divergence in client models. Consequently, FedAvg-trained models struggle to generalize when presented with unseen data from the global distribution. In this research paper, we conduct a systematic review of state-of-the-art approaches proposed to counteract global model performance degradation in the presence of heterogeneous data. To this end, we compile an original taxonomy, highlighting the main algorithmic approaches and mechanisms behind each identified category. Advancing the current body of knowledge, we empirically evaluate the generalization performance on visual tasks of various methods under moderate and significant levels of data heterogeneity, as common practice within the surveyed literature. In addition, the paper benchmarks the performance of hybrid techniques, resulting as a combination of client- and server-side algorithmic tweaks, by shedding light on some associated performance tradeoffs. While recognizing other relevant issues in FL, such as device heterogeneity and energy consumption, which have a non-negligible impact on the learning process, these well-investigated topics are not the main focus of this article.
2024
Mora, A., Bujari, A., Bellavista, P. (2024). Enhancing generalization in Federated Learning with heterogeneous data: A comparative literature review. FUTURE GENERATION COMPUTER SYSTEMS, 157, 1-15 [10.1016/j.future.2024.03.027].
Mora, Alessio; Bujari, Armir; Bellavista, Paolo
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/969398
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 3
social impact