CRIS Current Research Information System

We consider the problem of estimating CPU (distance computations) and I/O costs for processing range and k-nearest neighbors queries over metric spaces. Unlike the specific case of vector spaces, where information on data distribution has been exploited to derive cost models for predicting the performance of multi-dimensional access methods, in a generic metric space there is no such a possibility, which makes the problem quite different and requires a novel approach. We insist that the distance distribution of objects can be profitably used to solve the problem, and consequently develop a concrete cost model for the M-tree access method. Our results rely on the assumption that the indexed dataset comes from a metric space which is `homogeneous' enough (in a probabilistic sense) to allow reliable cost estimations even if the distance distribution with respect to a specific query object is unknown. We experimentally validate the model over both real and synthetic datasets, and show how the model can be used to tune the M-tree in order to minimize a combination of CPU and I/O costs. Finally, we sketch how the same approach can be applied to derive a cost model for the vp-tree index structure.

Ciaccia Paolo, Patella Marco, Zezula Pavel (1998). A Cost model for similarity queries in metric spaces. New York, NY, United States : ACM.

A Cost model for similarity queries in metric spaces

Ciaccia Paolo;Patella Marco;Zezula Pavel

1998

Abstract

We consider the problem of estimating CPU (distance computations) and I/O costs for processing range and k-nearest neighbors queries over metric spaces. Unlike the specific case of vector spaces, where information on data distribution has been exploited to derive cost models for predicting the performance of multi-dimensional access methods, in a generic metric space there is no such a possibility, which makes the problem quite different and requires a novel approach. We insist that the distance distribution of objects can be profitably used to solve the problem, and consequently develop a concrete cost model for the M-tree access method. Our results rely on the assumption that the indexed dataset comes from a metric space which is `homogeneous' enough (in a probabilistic sense) to allow reliable cost estimations even if the distance distribution with respect to a specific query object is unknown. We experimentally validate the model over both real and synthetic datasets, and show how the model can be used to tune the M-tree in order to minimize a combination of CPU and I/O costs. Finally, we sketch how the same approach can be applied to derive a cost model for the vp-tree index structure.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				1998
			
	Titolo del volume
	
				Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems
			
	Pagina iniziale
	
				59
			
	Pagina finale
	
				68
			
	Citazione
	
				Ciaccia Paolo,  Patella Marco,  Zezula Pavel (1998). A Cost model for similarity queries in metric spaces. New York, NY, United States : ACM.
			
	Tutti gli autori
	
						Ciaccia Paolo; Patella Marco; Zezula Pavel

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/918397

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

82

ND

ND

social impact