Explore chapters and articles related to this topic
What If the Data Is Not All There?
Published in Mitchell G. Maltenfort, Camilo Restrepo, Antonia F. Chen, Statistical Reasoning for Surgeons, 2020
Mitchell G. Maltenfort, Camilo Restrepo, Antonia F. Chen
Multiple imputation is preferable to single imputation because it accounts for potential variability in the replacement. Single imputation runs the risk of biasing the results. For example, if you replace a missing continuous value with its overall mean, then you have a value that does not reflect the relationships between variables in the dataset, nor does it include the random error that affects other measurements. This is also a problem with “last observation carried forward” in longitudinal studies.
Missing Data Analysis
Published in Atanu Bhattacharjee, Bayesian Approaches in Oncology Using R and OpenBUGS, 2020
We consider data of Head and neck cancer conducted by the Tata Memorial Centre. The results of this study were reported by Noronha [58]. The palliative chemotherapy patients were selected in this study. Cetuximab is a costly chemotherapeutic drug. Only a total of 30 patients could afford this drug. The quality of life (QOL) data is presented among those patients. Similarly, another group of the cohort () is treated with cisplatin therapy. Data is filled with European Organisation for Research and Treatment of Cancer(EORTC) QOL questionnaire. The primary objective of this study was to compare the QOL between Cisplatin and Cetuximab arm. Presence of missing data is handled by Bayesian methodology. The imputation technique is performed to obtain the missing data. The comparison between treatment effect on repeated measurements is presented in Chapter 9. This chapter is restricted only on missing data handling technique. Data is presented below.
Multiple Imputation
Published in Craig Mallinckrodt, Geert Molenberghs, Ilya Lipkovich, Bohdana Ratitch, Estimands, Estimators and Sensitivity Analysis in Clinical Trials, 2019
Craig Mallinckrodt, Geert Molenberghs, Ilya Lipkovich, Bohdana Ratitch
Whether this bias would translate into biased estimates of treatment contrasts is not a clear-cut issue. While including future outcomes in imputation model may protect against aforementioned bias, it may induce another type of bias. Note that in an RCT, treatment assignment is independent of baseline covariates due to randomization, in which case bias in estimates of covariate effects would not bias treatment contrasts. However, an incorrect imputation model (provided it includes post-baseline measures) may induce correlation between the treatment variable and covariates. Treatment contrasts could be biased indirectly through the biased estimates of covariates (see also Sullivan et al., 2018).
The fundamentals of Artificial Intelligence in medical education research: AMEE Guide No. 156
Published in Medical Teacher, 2023
Martin G. Tolsgaard, Martin V. Pusic, Stefanie S. Sebok-Syer, Brian Gin, Morten Bo Svendsen, Mark D. Syer, Ryan Brydges, Monica M. Cuddy, Christy K. Boscardin
When developing an AI model, one must describe the data sources and how they were sampled (e.g. assessment data, open-source data). Describing all case types provides information to assess not only the completeness and representativeness of the data, but also an understanding of the characteristics and conditions under which the model was developed and trained. The data description should provide evidence that the data are representative of the larger population who may be subject to the model predictions (e.g. age, gender and race). Similar to other procedures, sampling can greatly impact model development and selection, requiring efforts to ensure that systemic biases do not exist in the data. Data descriptions should also specifically reference any missing data, which has the potential to contribute bias in the development of an algorithm. If missing data are imputed the imputation approach must be described. It is also crucial to describe predictors and any ‘feature engineering’' that was performed (e.g. scaling, standardizing or categorizing the predictors or creating new predictors using approaches such as principal component analysis). Finally, the outcome measure must be specified, with details about how the outcome was assessed.
Missing data: current practice in football research and recommendations for improvement
Published in Science and Medicine in Football, 2022
David N. Borg, Robert Nguyen, Nicholas J. Tierney
Much in the way of there is no single ‘best’ statistical method, there is no perfect, one-size-fits-all approach for imputing data. The goal is to generate similar values that might have been otherwise recorded. Sometimes this means using a neighborhood approach of finding similar values. Or it could mean predicting responses using linear or tree models. Other times the most likely value might have been ‘0ʹ, or the last (or first) value carried forward. For detailed descriptions, and a summary of these methods, we suggest (Schafer and Graham 2002; European Medicines Agency 2010; Van Buuren 2012; Little et al. 2012; Cheema 2014). Irrespective of the imputation method used, it is essential to compare results of different missing data handling to understand how they may bias the results. For example, comparing study results from applying listwise deletion, compared to mean imputation, compared to linear regression imputation, and paying particular attention to whether the substantive conclusions of the study change between the imputation methods, and the impacts on any effect sizes. This can reveal bias occurring in imputation methods. An example of this approach is described in the Case Study in (Tierney and Cook 2018).
Associations of alcohol consumption status with activities of daily living among older adults in China
Published in Journal of Ethnicity in Substance Abuse, 2021
Yen-Han Lee, Peiyi Lu, Yen-Chang Chang, Mack Shelley, Yi-Ting Lee, Ching-Ti Liu
A few limitations of this research also should be highlighted. First, the study used secondary self-reported datasets to conduct statistical analyses. Self-reported bias could affect results in general; nevertheless, this is a common limitation in survey-based research using secondary and observational data. Similarly, a previous study has found that self-reported or self-rated health measurements are consistent with objective health status (Wu et al., 2013). Thus, the self-reported measurements in this research should provide sufficient information regarding ADLs among older adults. Second, a longitudinal study design with secondary data, rather than an experimental design, was used for this study. Third, we did not conduct survival analysis for this research. Although CLHLS is a longitudinal database, the unequal duration between rounds of data collection and the much smaller sample sizes in the recent rounds could make estimates of survival analysis less accurate. A total of 16,540 participants were included for data collection in the 2009 wave, but the study population drastically shrank to just 7,192 in 2014. Further research could use that approach with more consistent follow-up strategies. Lastly, we only included alcohol consumption status in this present research due to large missing information and/or skipped responses for the amount of consumption. For example, the consumption information had nearly 75% of missing information and/or skipped responses in the 2014 wave. Data imputation may lead to inaccurate results and induce study bias.