CRIS Current Research Information System

In high-dimensional and complex metric spaces, determining the nearest neighbor (NN) of a query object q can be a very expensive task, because of the poor partitioning operated by index structures - the so-called `curse of dimensionality'. This also affects approximately correct (AC) algorithms, which return as result a point whose distance from q is less than (1+ε) times the distance between q and its true NN. In this paper we introduce a new approach to approximate similarity search, called PAC-NN queries, where the error bound ε can be exceeded with probability δ and both ε and δ parameters can be tuned at query time to trade the quality of the result for the cost of the search. We describe sequential and index-based PAC-NN algorithms that exploit the distance distribution of the query object in order to determine a stopping condition that respects the error bound. Analysis and experimental evaluation of the sequential algorithm confirm that, for moderately large data sets and suitable ε and δ values, PAC-NN queries can be efficiently solved and the error controlled. Then, we provide experimental evidence that indexing can further speed-up the retrieval process by up to 1-2 orders of magnitude without giving up the accuracy of the result.

Ciaccia Paolo, Patella Marco (2000). PAC nearest neighbor queries: Approximate and controlled search in high-dimensional and metric spaces. Los Alamitos, CA, United States : IEEE [10.1109/ICDE.2000.839417].

PAC nearest neighbor queries: Approximate and controlled search in high-dimensional and metric spaces

Ciaccia Paolo;Patella Marco

2000

Abstract

In high-dimensional and complex metric spaces, determining the nearest neighbor (NN) of a query object q can be a very expensive task, because of the poor partitioning operated by index structures - the so-called `curse of dimensionality'. This also affects approximately correct (AC) algorithms, which return as result a point whose distance from q is less than (1+ε) times the distance between q and its true NN. In this paper we introduce a new approach to approximate similarity search, called PAC-NN queries, where the error bound ε can be exceeded with probability δ and both ε and δ parameters can be tuned at query time to trade the quality of the result for the cost of the search. We describe sequential and index-based PAC-NN algorithms that exploit the distance distribution of the query object in order to determine a stopping condition that respects the error bound. Analysis and experimental evaluation of the sequential algorithm confirm that, for moderately large data sets and suitable ε and δ values, PAC-NN queries can be efficiently solved and the error controlled. Then, we provide experimental evidence that indexing can further speed-up the retrieval process by up to 1-2 orders of magnitude without giving up the accuracy of the result.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2000
			
	Titolo del volume
	
				Proceedings of 16th International Conference on Data Engineering
			
	Pagina iniziale
	
				244
			
	Pagina finale
	
				255
			
	Codice DOI
	
				https://dx.doi.org/10.1109/ICDE.2000.839417
			
	Citazione
	
				Ciaccia Paolo,  Patella Marco (2000). PAC nearest neighbor queries: Approximate and controlled search in high-dimensional and metric spaces. Los Alamitos, CA, United States : IEEE [10.1109/ICDE.2000.839417].
			
	Tutti gli autori
	
						Ciaccia Paolo; Patella Marco

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/918331

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

111

ND

ND

social impact