Consider a table of n d-dimensional records, grouped into b buckets, and a set Q = {(Qh, wh)} of weighed partial-match conditions, where wh is the relative frequency of Qh. Let n(Qh) be the number of records which satisfy Qh, and b(Qh) the number of buckets in which these records are found. The problem we consider is the individuation of the optimal ordering field which minimizes the sum of accessed buckets, B(Q) = Σhwh × b(Qh). An exact solution requires the records to be sorted according to the values of each of the d fields in turn with an overall time complexity, evaluated as a function of bucket accesses, of O(d × b log b). We present an approximate O(b) algorithm which estimates the optimal ordering without the need to sort the table at all. The algorithm makes use of a probabilistic counting technique, known as linear counting, which requires a single scan of the table. Experimental results show that in most cases the approximate solution agrees with the exact one. © 1994.
Ciaccia P., Maio D. (1994). On the optimal ordering of multiple-field tables. DATA & KNOWLEDGE ENGINEERING, 14(1), 27-44 [10.1016/0169-023X(94)90007-8].
On the optimal ordering of multiple-field tables
Ciaccia P.;Maio D.
1994
Abstract
Consider a table of n d-dimensional records, grouped into b buckets, and a set Q = {(Qh, wh)} of weighed partial-match conditions, where wh is the relative frequency of Qh. Let n(Qh) be the number of records which satisfy Qh, and b(Qh) the number of buckets in which these records are found. The problem we consider is the individuation of the optimal ordering field which minimizes the sum of accessed buckets, B(Q) = Σhwh × b(Qh). An exact solution requires the records to be sorted according to the values of each of the d fields in turn with an overall time complexity, evaluated as a function of bucket accesses, of O(d × b log b). We present an approximate O(b) algorithm which estimates the optimal ordering without the need to sort the table at all. The algorithm makes use of a probabilistic counting technique, known as linear counting, which requires a single scan of the table. Experimental results show that in most cases the approximate solution agrees with the exact one. © 1994.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.