Explore chapters and articles related to this topic
Record Linkage
Published in Ian Foster, Rayid Ghani, Ron S. Jarmin, Frauke Kreuter, Julia Lane, Big Data and Social Science, 2020
The purpose of a record linkage algorithm is to examine pairs of records and make a prediction as to whether they correspond to the same underlying entity. (There are some sophisticated algorithms that examine sets of more than two records at a time [Steorts et al., 2014], but pairwise comparison remains the standard approach.) At the core of every record linkage algorithm is a function that compares two records and outputs a “score” that quantifies the similarity between those records. Mathematically, the match score is a function of the output from individual field comparisons: agreement in the first name field, agreement in the last name field, etc. Field comparisons may be binary—indicating agreement or disagreement—or they may output a range of values indicating different levels of agreement. There are a variety of methods in the statistical and computer science literature that can be used to generate a match score, including nearest-neighbor matching, regression-based matching, and propensity score matching. The probabilistic approach to record linkage defines the match score in terms of a likelihood ratio (Fellegi and Sunter, 1969).
Political Connections, Ownership Structure and Performance in China’s Mining Sector
Published in Karen Wendt, Sustainable Financial Innovations, 2018
Lei Xu, Ron P. McIver, Shiao-Lan Chou, Harjap Bassan
As a robustness measure, a propensity score matching method, with five matches, was used to select the sample to be used in estimation. Matching was based on FIRMSIZE; DEBTRATIO; OWNCONC; STATEOWN; and LEGALOWN. FOREIGNOWN was not included as a matching variable due to zero foreign ownership in shares in the listed mining sector. The average treatment effect on TobinsQ for mining firms is positive and statistically significant, supporting further analysis of differences between the mining and other sectors (Table 1).
Classical Statistics and Modern Machine Learning
Published in Mark Chang, Artificial Intelligence for Drug Development, Precision Medicine, and Healthcare, 2020
In observational studies without randomization, the baseline characteristics can be very different between two groups. The analysis of such data can lead to biased results. Propensity score matching (PSM) is a statistical matching technique that attempts to estimate the effect of an intervention, such as a medical treatment for the covariates that predict receiving the treatment in classical statistical analysis. Rosenbaum and Rubin (1983) introduced the technique.
Small scale biogas technology adoption behaviours of rural households and its effect on major crop yields in East Gojjam Zone of Ethiopia: propensity score matching approach
Published in Cogent Engineering, 2022
Fasika Chekol, Ashebir Tsegaye, Teshager Mazengia, Minas Hiruy
The foundation of the analysis was comprised of descriptive statistics and propensity score matching techniques. The PSMATCH2 technique and STATA 14 software were used to analyze the collected data. The T-test was used to determine whether there was a statistically significant difference between families using and not using biogas. Utilizing treatment and control observations together with their determinant variables, econometric analysis was used to assess the determinants of small-scale biogas technology adoption and the impact of this technology on agricultural yield. Propensity Score Matching (PSM) was used to analyze the effect of biogas technology on agricultural productivity in order to minimize potential sample bias because participation in the residential biogas program is not random (Liu et al., 2020; Becker & Ichino, 2002; Wu et al., 2010; Caliendo & Kopeinig, 2008).
Incorporating survival analysis into the safety effectiveness evaluation of treatments: Jointly modeling crash counts and time intervals between crashes
Published in Journal of Transportation Safety & Security, 2022
Lingtao Wu, Yi Meng, Xiaoqiang Kong, Yajie Zou
Safety analysts have proposed various methods for developing CMFs, such as before-after methods, cross-sectional studies (e.g., regression models and case-control), and expert panel studies (Gooch, Gayah, & Donnell, 2016; Gross et al., 2010; Wood, Gooch, & Donnell, 2015; Wu et al., 2015). The before-after and the cross-sectional (typically regression models) methods are the two main approaches. The before-after study estimates CMFs by the change in the number of crashes occurring in the periods before and after the improvement (Gross et al., 2010; Shen & Gan, 2003). The empirical Bayes (EB) before-after study has been the most popular method for estimating CMFs (Schumaker, Ahmed, & Ksaibati, 2017; Wu, Geedipally, & Pike, 2018). The CMFs derived from cross-sectional studies are based on the comparison in the safety performance of sites that have a specific feature with those that do not or are analyzed simultaneously based on datasets that contain a mixture of sites with different characteristics. Usually, safety analysts first develop a count regression model with the cross-section crash data, then derive CMFs from the parameter estimates of the model. Recently, researchers proposed propensity score matching approaches for evaluating the effectiveness of treatments (Dadashova, Wu, & Dixon, 2018; Rosenbaum & Rubin, 1984; Sasidharan & Donnell, 2013, 2014; Wood et al., 2015).
Examining the effects of residential self-selection on internal and external validity: an interaction moderation analysis using structural equation modeling
Published in Transportation Letters, 2019
The most well-known defect of propensity score matching is that it tries to promote internal validity at the cost of external validity insomuch as only part of the original sample is used for the analysis (Cao, Xu, and Fan 2010): The better determinant of treatment assignment one seeks to specify using propensity scores, the smaller the score-matched sample is (Connelly, Sackett, and Waters 2013).2 This indicates that to ensure a sufficient overlap between the treatment and no-treatment groups, those studies using propensity score matching should aim to increase the sample size (França et al. 2006), which is often technically and financially infeasible, or truncate the original sample (Imbens and Rubin 2015). In this latter case, however, external validity is even further harmed once the removal of unmatched cases causes systematic differences between the matched (truncated) sample and the full, original sample (Little and Rubin 2000; Marcus 1997). (Furthermore, the sample truncation per se cannot ensure sufficient overlap between the treatment and control groups, that is, a possibility is that there would be too little overlap to start with.)