Big Data and the ‘Internet of Things’ are transforming the processes of data collection, storage and use. The relationship between data collected first hand (primary data) and data collected by someone else (secondary data) is becoming more fluid. New possibilities for data collection are envisaged. Data integration is emerging as a reliable strategy to overcome data shortage and other challenges such as data coverage, quality, time dis-alignment and representativeness. When we have two (or more) data sources where the units are not (at least partially) overlapping and/or the units’ unique identifiers are unavailable, the different information collected can be integrated by using Micro Statistical Matching (MiSM). MiSM has been used in the social sciences, politics and economics, but there are very few applications that use agricultural and farm data. We present an example of MiSM data integration between primary and secondary farm data on agricultural holdings in the Emilia-Romagna region (Italy). The novelty of the work lies in the fact that integration is carried out with non-parametric MiSM, which is compared to predictive mean matching and Bayesian linear regression. Moreover, the matching validity is assessed with a new strategy. The main issues addressed, the lessons learned and the use in a research field characterised by critical data shortage are discussed.
D’Alberto, R., Raggi, M. (2021). From collection to integration: Non-parametric Statistical Matching between primary and secondary farm data. STATISTICAL JOURNAL OF THE IAOS, 37(2), 579-589 [10.3233/SJI-200644].
From collection to integration: Non-parametric Statistical Matching between primary and secondary farm data
D’Alberto, Riccardo
Primo
;Raggi, MeriSecondo
2021
Abstract
Big Data and the ‘Internet of Things’ are transforming the processes of data collection, storage and use. The relationship between data collected first hand (primary data) and data collected by someone else (secondary data) is becoming more fluid. New possibilities for data collection are envisaged. Data integration is emerging as a reliable strategy to overcome data shortage and other challenges such as data coverage, quality, time dis-alignment and representativeness. When we have two (or more) data sources where the units are not (at least partially) overlapping and/or the units’ unique identifiers are unavailable, the different information collected can be integrated by using Micro Statistical Matching (MiSM). MiSM has been used in the social sciences, politics and economics, but there are very few applications that use agricultural and farm data. We present an example of MiSM data integration between primary and secondary farm data on agricultural holdings in the Emilia-Romagna region (Italy). The novelty of the work lies in the fact that integration is carried out with non-parametric MiSM, which is compared to predictive mean matching and Bayesian linear regression. Moreover, the matching validity is assessed with a new strategy. The main issues addressed, the lessons learned and the use in a research field characterised by critical data shortage are discussed.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.