Distributed Data Clustering in Multi-Dimensional Peer-To-Peer Networks

Lodi, Stefano; Moro, Gianluca; Sartori, Claudio

Several algorithms have been recently developed for distributed data clustering, which are applied when data cannot be concentrated on a single machine, for instance because of privacy reasons or due to net- work bandwidth limitations, or because of the huge amount of distributed data. Deployed and research Peer-to-Peer systems have proven to be able to man- age very large databases made up by thousands of personal computers resulting in a concrete solutions for the forthcoming new distributed database systems to be used in large grid computing networks and in clustering database management systems. Current distributed data clustering algorithms cannot be ap- plied to such kind of networks because they expect data be organized according to traditional distributed database management systems where the distribution of the relational schema is planned a-priori in the de- sign phase. In this paper we describe methods to cluster distributed data across peer-to-peer networks without requiring any costly reorganization of data, which would be infeasible in such a large and dynamic overlay networks, and without reducing their perfor- mance in message routing and query processing. We compare the data clustering quality and ef- ciency of three multi-dimensional peer-to-peer sys- tems according to two well-known clustering tech- niques.

Stefano Lodi, Gianluca Moro, Claudio Sartori (2010). Distributed Data Clustering in Multi-Dimensional Peer-To-Peer Networks. AUSTRALIAN COMPUTER SCIENCE COMMUNICATIONS, Volume 32, Number 3, 171-178.