The title refers to the pursuit of understanding how the methods of statistics and data analysis process and transform the data, and what this implies. Statistics is predominantly taught and interpreted based on probability models, but, as pioneers of data analysis such as Tukey, Gower, and Benzecri knew, there is far more to the understanding of statistical methods than their performance assuming certain models.This includes some rather elementary considerations with wide-ranging consequences that are rarely taught or discussed, such as the different effects of different standardisation techniques when aggregating variables, implications of loss function choices in cross-validation, or what a hypothesis test actually does in case the assumed model is not true (pretty much always, that is).My presentation will make an appeal to look at our methods from the angle of a direct interpretation of what they do to the data. I will also discuss how this is philosophically different from setting up probability models and trying to infer "truths" assuming them, and what the role of models can be for the data analytic understanding of statistics if not assumed to be true.
Hennig, C. (2025). Data Analytic Understanding of Statistics. GEWERBESTRASSE 11, CHAM, CH-6330, SWITZERLAND : SPRINGER INTERNATIONAL PUBLISHING AG [10.1007/978-3-031-96736-8_2].
Data Analytic Understanding of Statistics
Hennig, Christian
2025
Abstract
The title refers to the pursuit of understanding how the methods of statistics and data analysis process and transform the data, and what this implies. Statistics is predominantly taught and interpreted based on probability models, but, as pioneers of data analysis such as Tukey, Gower, and Benzecri knew, there is far more to the understanding of statistical methods than their performance assuming certain models.This includes some rather elementary considerations with wide-ranging consequences that are rarely taught or discussed, such as the different effects of different standardisation techniques when aggregating variables, implications of loss function choices in cross-validation, or what a hypothesis test actually does in case the assumed model is not true (pretty much always, that is).My presentation will make an appeal to look at our methods from the angle of a direct interpretation of what they do to the data. I will also discuss how this is philosophically different from setting up probability models and trying to infer "truths" assuming them, and what the role of models can be for the data analytic understanding of statistics if not assumed to be true.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


