In this work we propose models for access cost estimation that are suitable in the physical design of a relational database when a set of secondary indexes has to be built on some attributes of the relations. The models are tailored to deal with distinct kinds of queries (partial-match, interval, join, etc.) and are based on a measure of association, the clustering factor, which applies between an attribute and the physical location of records in a file as well as between two (sets of) attributes. The use of clustering factors and the value selectivity of a query (i.e. how many distinct values satisfy a query) allow design time models to be derived without previously needing to estimate the record selectivities (i.e. how many records satisfy a query) or the corresponding access costs of all the query instances that can occur at run time. In practice, unlike previous approaches to the problem, run time models are derived by specializing design time models, rather than vice versa. Estimation of access costs with alternative ordering criteria is also considered, and a model is proposed that allows the primary attribute to be chosen wothout the need to sort the tuples. The proposed models achieve a good tradeoff between accuracy and simplicity, without being based on restrictive assumptions as to data, and easily allow the design process to take advantage of semantic information about the application domain even if the data are not yet loaded in the database. © 1993.

Access cost estimation for physical database design

Ciaccia P.;Maio D.
1993

Abstract

In this work we propose models for access cost estimation that are suitable in the physical design of a relational database when a set of secondary indexes has to be built on some attributes of the relations. The models are tailored to deal with distinct kinds of queries (partial-match, interval, join, etc.) and are based on a measure of association, the clustering factor, which applies between an attribute and the physical location of records in a file as well as between two (sets of) attributes. The use of clustering factors and the value selectivity of a query (i.e. how many distinct values satisfy a query) allow design time models to be derived without previously needing to estimate the record selectivities (i.e. how many records satisfy a query) or the corresponding access costs of all the query instances that can occur at run time. In practice, unlike previous approaches to the problem, run time models are derived by specializing design time models, rather than vice versa. Estimation of access costs with alternative ordering criteria is also considered, and a model is proposed that allows the primary attribute to be chosen wothout the need to sort the tuples. The proposed models achieve a good tradeoff between accuracy and simplicity, without being based on restrictive assumptions as to data, and easily allow the design process to take advantage of semantic information about the application domain even if the data are not yet loaded in the database. © 1993.
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/918328
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? ND
social impact