CRIS Current Research Information System

Approximate similarity queries are a practical way to obtain good, yet suboptimal, results from large data sets without having to pay high execution costs. In this paper we analyze the problem of understanding how the strategy for searching through an index tree, also called scheduling policy, can influence costs. We consider quality-controlled similarity queries, in which the user sets a quality (distance) threshold \theta¸ and the system halts as soon as it finds k objects in the data set at distance \theta¸ from the query object. After providing experimental evidence that the scheduling policy might indeed have a high impact on paid costs, we characterize the policies' behavior through an analytical cost model, in which a major role is played by parameterized local distance distributions. Such distributions are also the key to derive new scheduling policies, which we show to be optimal in a simplified, yet relevant, scenario.

Ciaccia, P., Patella, M. (2017). The Power of Distance Distributions: Cost Models and Scheduling Policies for Quality-Controlled Similarity Queries. Heidelberg : Springer Verlag [10.1007/978-3-319-68474-1_1].

The Power of Distance Distributions: Cost Models and Scheduling Policies for Quality-Controlled Similarity Queries

Ciaccia, Paolo;Patella, Marco

2017

Abstract

Approximate similarity queries are a practical way to obtain good, yet suboptimal, results from large data sets without having to pay high execution costs. In this paper we analyze the problem of understanding how the strategy for searching through an index tree, also called scheduling policy, can influence costs. We consider quality-controlled similarity queries, in which the user sets a quality (distance) threshold \theta¸ and the system halts as soon as it finds k objects in the data set at distance \theta¸ from the query object. After providing experimental evidence that the scheduling policy might indeed have a high impact on paid costs, we characterize the policies' behavior through an analytical cost model, in which a major role is played by parameterized local distance distributions. Such distributions are also the key to derive new scheduling policies, which we show to be optimal in a simplified, yet relevant, scenario.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2017
			
	Titolo del volume
	
				Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
			
	Pagina iniziale
	
				3
			
	Pagina finale
	
				16
			
	Collana/Serie
	
				LECTURE NOTES IN COMPUTER SCIENCE
			
	Codice DOI
	
				https://dx.doi.org/10.1007/978-3-319-68474-1_1
			
	Citazione
	
				Ciaccia, P., Patella, M. (2017). The Power of Distance Distributions: Cost Models and Scheduling Policies for Quality-Controlled Similarity Queries. Heidelberg : Springer Verlag [10.1007/978-3-319-68474-1_1].
			
	Tutti gli autori
	
						Ciaccia, Paolo; Patella, Marco
					
	Appare nelle tipologie:
	
				4.01 Contributo in Atti di convegno

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/614471

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

1

1

ND

social impact