Consider a table of n d-dimensional records, grouped into b buckets, and a set Q = {(Qh, wh)} of weighed partial-match conditions, where wh is the relative frequency of Qh. Let n(Qh) be the number of records which satisfy Qh, and b(Qh) the number of buckets in which these records are found. The problem we consider is the individuation of the optimal ordering field which minimizes the sum of accessed buckets, B(Q) = Σhwh × b(Qh). An exact solution requires the records to be sorted according to the values of each of the d fields in turn with an overall time complexity, evaluated as a function of bucket accesses, of O(d × b log b). We present an approximate O(b) algorithm which estimates the optimal ordering without the need to sort the table at all. The algorithm makes use of a probabilistic counting technique, known as linear counting, which requires a single scan of the table. Experimental results show that in most cases the approximate solution agrees with the exact one. © 1994.

Ciaccia P., Maio D. (1994). On the optimal ordering of multiple-field tables. DATA & KNOWLEDGE ENGINEERING, 14(1), 27-44 [10.1016/0169-023X(94)90007-8].

On the optimal ordering of multiple-field tables

Ciaccia P.;Maio D.
1994

Abstract

Consider a table of n d-dimensional records, grouped into b buckets, and a set Q = {(Qh, wh)} of weighed partial-match conditions, where wh is the relative frequency of Qh. Let n(Qh) be the number of records which satisfy Qh, and b(Qh) the number of buckets in which these records are found. The problem we consider is the individuation of the optimal ordering field which minimizes the sum of accessed buckets, B(Q) = Σhwh × b(Qh). An exact solution requires the records to be sorted according to the values of each of the d fields in turn with an overall time complexity, evaluated as a function of bucket accesses, of O(d × b log b). We present an approximate O(b) algorithm which estimates the optimal ordering without the need to sort the table at all. The algorithm makes use of a probabilistic counting technique, known as linear counting, which requires a single scan of the table. Experimental results show that in most cases the approximate solution agrees with the exact one. © 1994.
1994
Ciaccia P., Maio D. (1994). On the optimal ordering of multiple-field tables. DATA & KNOWLEDGE ENGINEERING, 14(1), 27-44 [10.1016/0169-023X(94)90007-8].
Ciaccia P.; Maio D.
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/918310
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact