Top-𝑘 queries, in particular those based on a linear scoring function, are a common way to extract relevant results from large datasets. Their major advantage over alternative approaches, such as skyline queries (which return all the undominated objects in a dataset), is that the cardinality of the output can be easily controlled through the 𝑘 parameter and user preferences can be accommodated by appropriately weighing the involved attributes. In this paper we concentrate on two so-far neglected aspects of top-𝑘 queries: first, their general ability to return all the potentially interesting results, i.e., the tuples in the skyline; second, the difficulty that linear top-𝑘 queries might encounter in returning tuples with balanced attribute values that match user preferences more closely than tuples that are extremely good in one dimension but (very) poor in others. In order to quantify these undesirable effects we introduce four novel indicators for skyline tuples, which measure their robustness as well as the difficulty incurred by top-𝑘 queries to retrieve them. After observing that real datasets usually contain many relevant results that are hardly retrievable by linear top-𝑘 queries, and with the aim of favoring balanced results, we extend the queries with a term that accounts for the distance of a tuple from the preference direction established by the attributes’ weights. This novel query, which we call directional query, adds the flexibility needed to allow each skyline tuple to be ranked first for a proper choice of weights, with no extra burden on the user and, in the most adverse scenarios, only a minor computational overhead, as measured through an extensive experimental analysis on real and synthetic data.
Ciaccia, P., Martinenghi, D. (2024). Directional Queries: Making Top-k Queries More Effective in Discovering Relevant Results. PROCEEDINGS OF THE ACM ON MANAGEMENT OF DATA, 2, 1-26 [10.1145/3698807].
Directional Queries: Making Top-k Queries More Effective in Discovering Relevant Results
Paolo CiacciaPrimo
;
2024
Abstract
Top-𝑘 queries, in particular those based on a linear scoring function, are a common way to extract relevant results from large datasets. Their major advantage over alternative approaches, such as skyline queries (which return all the undominated objects in a dataset), is that the cardinality of the output can be easily controlled through the 𝑘 parameter and user preferences can be accommodated by appropriately weighing the involved attributes. In this paper we concentrate on two so-far neglected aspects of top-𝑘 queries: first, their general ability to return all the potentially interesting results, i.e., the tuples in the skyline; second, the difficulty that linear top-𝑘 queries might encounter in returning tuples with balanced attribute values that match user preferences more closely than tuples that are extremely good in one dimension but (very) poor in others. In order to quantify these undesirable effects we introduce four novel indicators for skyline tuples, which measure their robustness as well as the difficulty incurred by top-𝑘 queries to retrieve them. After observing that real datasets usually contain many relevant results that are hardly retrievable by linear top-𝑘 queries, and with the aim of favoring balanced results, we extend the queries with a term that accounts for the distance of a tuple from the preference direction established by the attributes’ weights. This novel query, which we call directional query, adds the flexibility needed to allow each skyline tuple to be ranked first for a proper choice of weights, with no extra burden on the user and, in the most adverse scenarios, only a minor computational overhead, as measured through an extensive experimental analysis on real and synthetic data.File | Dimensione | Formato | |
---|---|---|---|
Directional.pdf
accesso aperto
Tipo:
Postprint
Licenza:
Licenza per accesso libero gratuito
Dimensione
6.45 MB
Formato
Adobe PDF
|
6.45 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.