Record linkage

Record linkage

Record linkage is a process that involves associating different records that correspond to the same entity, such as a person. The goal of record linkage is to identify which records refer to the same entity and which do not. This is achieved through the use of a record linkage algorithm, which examines pairs of records and predicts whether they correspond to the same underlying entity.From: The CIO's Guide to Oracle Products and Solutions [2019], Schema on read modeling approach as a basis of big data analytics integration in EIS [2018], Big Data and Social Science [2019]

View Chapter

Purchase Book

Published in Ian Foster, Rayid Ghani, Ron S. Jarmin, Frauke Kreuter, Julia Lane, Big Data and Social Science, 2020

Tokle Joshua, Bender Stefan

The purpose of a record linkage algorithm is to examine pairs of records and make a prediction as to whether they correspond to the same underlying entity. (There are some sophisticated algorithms that examine sets of more than two records at a time [Steorts et al., 2014], but pairwise comparison remains the standard approach.) At the core of every record linkage algorithm is a function that compares two records and outputs a “score” that quantifies the similarity between those records. Mathematically, the match score is a function of the output from individual field comparisons: agreement in the first name field, agreement in the last name field, etc. Field comparisons may be binary—indicating agreement or disagreement—or they may output a range of values indicating different levels of agreement. There are a variety of methods in the statistical and computer science literature that can be used to generate a match score, including nearest-neighbor matching, regression-based matching, and propensity score matching. The probabilistic approach to record linkage defines the match score in terms of a likelihood ratio (Fellegi and Sunter, 1969).

Schema on read modeling approach as a basis of big data analytics integration in EIS

View Article

Journal Information

Published in Enterprise Information Systems, 2018

Slađana Janković, Snežana Mladenović, Dušan Mladenović, Slavko Vesković, Draženko Glavić

The main task of data integration, regardless of whether it is traditional or Big Data integration, batch or real-time data integration, is to download the required data from their current warehouse, to change their format in order to be compatible with the destination warehouse and to place them at the target location (Loshin 2013). It is the challenges which data integration has to address that have changed. The three main steps in data integration include schema alignment, record linkage and data fusion. Schema alignment should respond to the challenge of semantic ambiguity, enabling the identification of attributes with the same meaning as well as those without it. Record linkage should find out which records refer to the same entity and which do not. Data fusion should enable the identification of accurate data in an integrated data set in cases when different sources offer conflicting values.

Record linkage

Explore chapters and articles related to this topic

Record Linkage

Schema on read modeling approach as a basis of big data analytics integration in EIS