The adoption of sophisticated analytical tools, including Machine Learning and massive data processing, has accelerated health research. However, a foundational principle asserts that the rigor of these complex methods is dependent on the integrity and validity of the underlying statistical design. I posit that advanced analyses, particularly in epidemiology, must be subsequent to the rigorous verification of methodological coherence. In this study, I used an exploratory case to demonstrate a crucial cautionary principle: Complex models amplify, rather than correct, substantial methodological limitations. To demonstrate this, I applied standard descriptive and inferential statistical methods (Z-tests, Confidence Intervals, and t-tests) alongside established national epidemiological benchmarks to a published cohort study on vaccine outcomes and psychiatric events. Through this approach, I identified multiple, statistically significant inconsistencies within the source data, including implausible incidence rates and relevant baseline group imbalances. These findings, supported by inferential statistical evidence, demonstrated that the observed effects (e.g., contradictory Hazard Ratios) are not biological but are mathematical artifacts stemming from uncorrected selection and classification biases in the cohort construction. These paradoxes arise from the exclusion of prevalent psychiatric cases in the vaccinated group and the misclassification of pre-existing conditions as new incident events in the control group. Our analysis serves as a robust demonstration that the validity of any conclusion drawn from subsequent advanced ML or statistical modeling sourced from public health data rests on first passing the test of basic epidemiological consistency.

Roccetti, M. (2026). Before the algorithm: An exemplar case of the necessity of statistical testing for epidemiological consistency in public health data. AIMS PUBLIC HEALTH, 13(1), 121-134 [10.3934/publichealth.2026008].

Before the algorithm: An exemplar case of the necessity of statistical testing for epidemiological consistency in public health data

Roccetti M.
Primo
2026

Abstract

The adoption of sophisticated analytical tools, including Machine Learning and massive data processing, has accelerated health research. However, a foundational principle asserts that the rigor of these complex methods is dependent on the integrity and validity of the underlying statistical design. I posit that advanced analyses, particularly in epidemiology, must be subsequent to the rigorous verification of methodological coherence. In this study, I used an exploratory case to demonstrate a crucial cautionary principle: Complex models amplify, rather than correct, substantial methodological limitations. To demonstrate this, I applied standard descriptive and inferential statistical methods (Z-tests, Confidence Intervals, and t-tests) alongside established national epidemiological benchmarks to a published cohort study on vaccine outcomes and psychiatric events. Through this approach, I identified multiple, statistically significant inconsistencies within the source data, including implausible incidence rates and relevant baseline group imbalances. These findings, supported by inferential statistical evidence, demonstrated that the observed effects (e.g., contradictory Hazard Ratios) are not biological but are mathematical artifacts stemming from uncorrected selection and classification biases in the cohort construction. These paradoxes arise from the exclusion of prevalent psychiatric cases in the vaccinated group and the misclassification of pre-existing conditions as new incident events in the control group. Our analysis serves as a robust demonstration that the validity of any conclusion drawn from subsequent advanced ML or statistical modeling sourced from public health data rests on first passing the test of basic epidemiological consistency.
2026
Roccetti, M. (2026). Before the algorithm: An exemplar case of the necessity of statistical testing for epidemiological consistency in public health data. AIMS PUBLIC HEALTH, 13(1), 121-134 [10.3934/publichealth.2026008].
Roccetti, M.
File in questo prodotto:
File Dimensione Formato  
10.3934_publichealth.2026008 (10).pdf

accesso aperto

Tipo: Versione (PDF) editoriale / Version Of Record
Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione 558.65 kB
Formato Adobe PDF
558.65 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1040030
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 0
social impact