Data streams are one of the most important components of big data platforms, as they enable decision-makers to obtain timely insights about a given phenomenon in near real-time. Though big data is often schemaless, little attention has been paid in the literature to schemaless streams. Streaming algorithms typically employ approximation and pre-aggregation strategies to efficiently provide analytical capabilities over data stream; however, the heterogeneous nature of records in a schemaless stream introduces challenges that make related proposals ineffective. In this paper, we propose an approach to overcome these challenges while formulating multidimensional analytical queries under the sliding window paradigm. The approach is called self-adaptive because it automatically keeps up with the heterogeneity of data while being able to propose an interesting query over the current window. The approach is discussed in full detail and evaluated in both terms of efficiency and effectiveness.
Forresi, C., Francia, M., Gallinucci, E., Golfarelli, M. (2025). Self-adaptive analytical querying over schemaless data streams. JOURNAL OF BIG DATA, 12(1), 1-32 [10.1186/s40537-025-01251-1].
Self-adaptive analytical querying over schemaless data streams
Forresi C.;Francia M.;Gallinucci E.;Golfarelli M.
2025
Abstract
Data streams are one of the most important components of big data platforms, as they enable decision-makers to obtain timely insights about a given phenomenon in near real-time. Though big data is often schemaless, little attention has been paid in the literature to schemaless streams. Streaming algorithms typically employ approximation and pre-aggregation strategies to efficiently provide analytical capabilities over data stream; however, the heterogeneous nature of records in a schemaless stream introduces challenges that make related proposals ineffective. In this paper, we propose an approach to overcome these challenges while formulating multidimensional analytical queries under the sliding window paradigm. The approach is called self-adaptive because it automatically keeps up with the heterogeneity of data while being able to propose an interesting query over the current window. The approach is discussed in full detail and evaluated in both terms of efficiency and effectiveness.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


