Information flooding may occur during an OLAP session when the user drills down her cube up to a very fine-grained level, because the huge number of facts returned makes it very hard to analyze them using a pivot table. To overcome this problem we propose a novel OLAP operation, called shrink, aimed at balancing data precision with data size in cube visualization via pivot tables. The shrink operation fuses slices of similar data and replaces them with a single representative slice, respecting the constraints suggested by dimension hierarchies, until the result has either size or error smaller than a given threshold. An optimal computation of the shrink operation has exponential complexity, so we present both a greedy algorithm based on agglomerative clustering, which returns a sub-optimal solution, and a branch-and-bound algorithm that returns an optimal solution. Finally, we discuss some experimental results to evaluate the shrink operation from the efficiency and effectiveness point of view.
Simone Graziani, Matteo Golfarelli, Stefano Rizzi (2014). Shrink: An OLAP Operation for Balancing Precision and Size of Pivot Tables. DATA & KNOWLEDGE ENGINEERING, 93, 19-41 [10.1016/j.datak.2014.07.004].
Shrink: An OLAP Operation for Balancing Precision and Size of Pivot Tables
GRAZIANI, SIMONE;GOLFARELLI, MATTEO;RIZZI, STEFANO
2014
Abstract
Information flooding may occur during an OLAP session when the user drills down her cube up to a very fine-grained level, because the huge number of facts returned makes it very hard to analyze them using a pivot table. To overcome this problem we propose a novel OLAP operation, called shrink, aimed at balancing data precision with data size in cube visualization via pivot tables. The shrink operation fuses slices of similar data and replaces them with a single representative slice, respecting the constraints suggested by dimension hierarchies, until the result has either size or error smaller than a given threshold. An optimal computation of the shrink operation has exponential complexity, so we present both a greedy algorithm based on agglomerative clustering, which returns a sub-optimal solution, and a branch-and-bound algorithm that returns an optimal solution. Finally, we discuss some experimental results to evaluate the shrink operation from the efficiency and effectiveness point of view.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.