The availability of cost-effective data collections and storage hardware has allowed organizations to accumulate very large data sets, which are a potential source of previously unknown valuable information. The process of discovering interesting patterns in such large data sets is referred to as data mining. Outlier detection is a data mining task consisting in the discovery of observations which deviate substantially from the rest of the data, and has many important practical applications. Outlier detection in very large data sets is however computationally very demanding and currently requires highperformance computing facilities. We propose a family of parallel algorithms for Graphic Processing Units (GPU), derived from two distance-based outlier detection algorithms: the BruteForce and the SolvingSet. We analyze their performance with an extensive set of experiments, comparing the GPU implementations with the base CPU versions and obtaining significant speedups.
F. Angiulli, S. Basta, S. Lodi, C. Sartori (2013). Fast outlier detection using a GPU. 2009 IEEE Conference Proceedings [10.1109/HPCSim.2013.6641405].
Fast outlier detection using a GPU
LODI, STEFANO;SARTORI, CLAUDIO
2013
Abstract
The availability of cost-effective data collections and storage hardware has allowed organizations to accumulate very large data sets, which are a potential source of previously unknown valuable information. The process of discovering interesting patterns in such large data sets is referred to as data mining. Outlier detection is a data mining task consisting in the discovery of observations which deviate substantially from the rest of the data, and has many important practical applications. Outlier detection in very large data sets is however computationally very demanding and currently requires highperformance computing facilities. We propose a family of parallel algorithms for Graphic Processing Units (GPU), derived from two distance-based outlier detection algorithms: the BruteForce and the SolvingSet. We analyze their performance with an extensive set of experiments, comparing the GPU implementations with the base CPU versions and obtaining significant speedups.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.