CRIS Current Research Information System

Consider a table of n d-dimensional records, grouped into b buckets, and a set Q = {(Qh, wh)} of weighed partial-match conditions, where wh is the relative frequency of Qh. Let n(Qh) be the number of records which satisfy Qh, and b(Qh) the number of buckets in which these records are found. The problem we consider is the individuation of the optimal ordering field which minimizes the sum of accessed buckets, B(Q) = Σhwh × b(Qh). An exact solution requires the records to be sorted according to the values of each of the d fields in turn with an overall time complexity, evaluated as a function of bucket accesses, of O(d × b log b). We present an approximate O(b) algorithm which estimates the optimal ordering without the need to sort the table at all. The algorithm makes use of a probabilistic counting technique, known as linear counting, which requires a single scan of the table. Experimental results show that in most cases the approximate solution agrees with the exact one. © 1994.

Ciaccia P., Maio D. (1994). On the optimal ordering of multiple-field tables. DATA & KNOWLEDGE ENGINEERING, 14(1), 27-44 [10.1016/0169-023X(94)90007-8].

On the optimal ordering of multiple-field tables

Ciaccia P.;Maio D.

1994

Abstract

Consider a table of n d-dimensional records, grouped into b buckets, and a set Q = {(Qh, wh)} of weighed partial-match conditions, where wh is the relative frequency of Qh. Let n(Qh) be the number of records which satisfy Qh, and b(Qh) the number of buckets in which these records are found. The problem we consider is the individuation of the optimal ordering field which minimizes the sum of accessed buckets, B(Q) = Σhwh × b(Qh). An exact solution requires the records to be sorted according to the values of each of the d fields in turn with an overall time complexity, evaluated as a function of bucket accesses, of O(d × b log b). We present an approximate O(b) algorithm which estimates the optimal ordering without the need to sort the table at all. The algorithm makes use of a probabilistic counting technique, known as linear counting, which requires a single scan of the table. Experimental results show that in most cases the approximate solution agrees with the exact one. © 1994.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				1994
			
	Rivista
	
				DATA & KNOWLEDGE ENGINEERING
			
	Codice DOI
	
				https://dx.doi.org/10.1016/0169-023X(94)90007-8
			
	Citazione
	
				Ciaccia P.,  Maio D. (1994). On the optimal ordering of multiple-field tables. DATA & KNOWLEDGE ENGINEERING, 14(1), 27-44 [10.1016/0169-023X(94)90007-8].
			
	Tutti gli autori
	
						Ciaccia P.; Maio D.

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/918310

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

0

0

social impact