Bipartite record linkage has the goal of identifying observations referring to the same individual, called coreferent observations, across two distinct non-duplicated datasets. The two main approaches to solve this task are the Fellegi–Sunter model, which relies on pairwise comparisons of observations, and the graphical record linkage model, which directly models the data and groups together coreferent observations. In this paper, we aim to investigate the similarities between these two methods. We show that both models can be expressed in terms of a latent binary matrix indicating coreferent record pairs, that they can be framed as particular latent class analysis models and that they admit a direct relationship between their parameters under a common data model. Moreover, we propose a unified estimation framework based on a classification expectation–maximization algorithm. The proposed estimation method properly incorporates the problem constraints, while still allowing for a computationally efficient implementation. Moreover, it allows for an interchangeable use of the same distributional assumptions on the linkage distribution between the two models. Empirical results using the proposed estimation method demonstrate satisfactory and mostly equivalent performance for two models both on simulations and on a real dataset commonly used as a benchmark for record linkage.

Redivo, E. (2026). Linking the Comparison and Graphical Approaches to Bipartite Matching. INTERNATIONAL STATISTICAL REVIEW, NA, 1-26 [10.1111/insr.70038].

Linking the Comparison and Graphical Approaches to Bipartite Matching

Redivo, Edoardo
2026

Abstract

Bipartite record linkage has the goal of identifying observations referring to the same individual, called coreferent observations, across two distinct non-duplicated datasets. The two main approaches to solve this task are the Fellegi–Sunter model, which relies on pairwise comparisons of observations, and the graphical record linkage model, which directly models the data and groups together coreferent observations. In this paper, we aim to investigate the similarities between these two methods. We show that both models can be expressed in terms of a latent binary matrix indicating coreferent record pairs, that they can be framed as particular latent class analysis models and that they admit a direct relationship between their parameters under a common data model. Moreover, we propose a unified estimation framework based on a classification expectation–maximization algorithm. The proposed estimation method properly incorporates the problem constraints, while still allowing for a computationally efficient implementation. Moreover, it allows for an interchangeable use of the same distributional assumptions on the linkage distribution between the two models. Empirical results using the proposed estimation method demonstrate satisfactory and mostly equivalent performance for two models both on simulations and on a real dataset commonly used as a benchmark for record linkage.
2026
Redivo, E. (2026). Linking the Comparison and Graphical Approaches to Bipartite Matching. INTERNATIONAL STATISTICAL REVIEW, NA, 1-26 [10.1111/insr.70038].
Redivo, Edoardo
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1062714
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact