The adoption of sophisticated analytical tools, including Machine Learning and massive data processing, has accelerated health research. However, a foundational principle asserts that the rigor of these complex methods is dependent on the integrity and validity of the underlying statistical design. I posit that advanced analyses, particularly in epidemiology, must be subsequent to the rigorous verification of methodological coherence. In this study, I used an exploratory case to demonstrate a crucial cautionary principle: Complex models amplify, rather than correct, substantial methodological limitations. To demonstrate this, I applied standard descriptive and inferential statistical methods (Z-tests, Confidence Intervals, and t-tests) alongside established national epidemiological benchmarks to a published cohort study on vaccine outcomes and psychiatric events. Through this approach, I identified multiple, statistically significant inconsistencies within the source data, including implausible incidence rates and relevant baseline group imbalances. These findings, supported by inferential statistical evidence, demonstrated that the observed effects (e.g., contradictory Hazard Ratios) are not biological but are mathematical artifacts stemming from uncorrected selection and classification biases in the cohort construction. These paradoxes arise from the exclusion of prevalent psychiatric cases in the vaccinated group and the misclassification of pre-existing conditions as new incident events in the control group. Our analysis serves as a robust demonstration that the validity of any conclusion drawn from subsequent advanced ML or statistical modeling sourced from public health data rests on first passing the test of basic epidemiological consistency.
Roccetti, M. (2026). Before the algorithm: An exemplar case of the necessity of statistical testing for epidemiological consistency in public health data. AIMS PUBLIC HEALTH, 13(1), 121-134 [10.3934/publichealth.2026008].
Before the algorithm: An exemplar case of the necessity of statistical testing for epidemiological consistency in public health data
Roccetti M.
Primo
2026
Abstract
The adoption of sophisticated analytical tools, including Machine Learning and massive data processing, has accelerated health research. However, a foundational principle asserts that the rigor of these complex methods is dependent on the integrity and validity of the underlying statistical design. I posit that advanced analyses, particularly in epidemiology, must be subsequent to the rigorous verification of methodological coherence. In this study, I used an exploratory case to demonstrate a crucial cautionary principle: Complex models amplify, rather than correct, substantial methodological limitations. To demonstrate this, I applied standard descriptive and inferential statistical methods (Z-tests, Confidence Intervals, and t-tests) alongside established national epidemiological benchmarks to a published cohort study on vaccine outcomes and psychiatric events. Through this approach, I identified multiple, statistically significant inconsistencies within the source data, including implausible incidence rates and relevant baseline group imbalances. These findings, supported by inferential statistical evidence, demonstrated that the observed effects (e.g., contradictory Hazard Ratios) are not biological but are mathematical artifacts stemming from uncorrected selection and classification biases in the cohort construction. These paradoxes arise from the exclusion of prevalent psychiatric cases in the vaccinated group and the misclassification of pre-existing conditions as new incident events in the control group. Our analysis serves as a robust demonstration that the validity of any conclusion drawn from subsequent advanced ML or statistical modeling sourced from public health data rests on first passing the test of basic epidemiological consistency.| File | Dimensione | Formato | |
|---|---|---|---|
|
10.3934_publichealth.2026008 (10).pdf
accesso aperto
Tipo:
Versione (PDF) editoriale / Version Of Record
Licenza:
Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione
558.65 kB
Formato
Adobe PDF
|
558.65 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


