Several algorithms have been recently developed for distributed data clustering, which are applied when data cannot be concentrated on a single machine, for instance because of privacy reasons or due to net- work bandwidth limitations, or because of the huge amount of distributed data. Deployed and research Peer-to-Peer systems have proven to be able to man- age very large databases made up by thousands of personal computers resulting in a concrete solutions for the forthcoming new distributed database systems to be used in large grid computing networks and in clustering database management systems. Current distributed data clustering algorithms cannot be ap- plied to such kind of networks because they expect data be organized according to traditional distributed database management systems where the distribution of the relational schema is planned a-priori in the de- sign phase. In this paper we describe methods to cluster distributed data across peer-to-peer networks without requiring any costly reorganization of data, which would be infeasible in such a large and dynamic overlay networks, and without reducing their perfor- mance in message routing and query processing. We compare the data clustering quality and ef- ciency of three multi-dimensional peer-to-peer sys- tems according to two well-known clustering tech- niques.

Distributed Data Clustering in Multi-Dimensional Peer-To-Peer Networks

LODI, STEFANO;MORO, GIANLUCA;SARTORI, CLAUDIO
2010

Abstract

Several algorithms have been recently developed for distributed data clustering, which are applied when data cannot be concentrated on a single machine, for instance because of privacy reasons or due to net- work bandwidth limitations, or because of the huge amount of distributed data. Deployed and research Peer-to-Peer systems have proven to be able to man- age very large databases made up by thousands of personal computers resulting in a concrete solutions for the forthcoming new distributed database systems to be used in large grid computing networks and in clustering database management systems. Current distributed data clustering algorithms cannot be ap- plied to such kind of networks because they expect data be organized according to traditional distributed database management systems where the distribution of the relational schema is planned a-priori in the de- sign phase. In this paper we describe methods to cluster distributed data across peer-to-peer networks without requiring any costly reorganization of data, which would be infeasible in such a large and dynamic overlay networks, and without reducing their perfor- mance in message routing and query processing. We compare the data clustering quality and ef- ciency of three multi-dimensional peer-to-peer sys- tems according to two well-known clustering tech- niques.
Stefano Lodi; Gianluca Moro; Claudio Sartori
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/88397
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 12
  • ???jsp.display-item.citation.isi??? ND
social impact