Explore chapters and articles related to this topic
Introduction
Published in William M. Mendenhall, Terry L. Sincich, Statistics for Engineering and the Sciences, 2016
William M. Mendenhall, Terry L. Sincich
No matter what type of sampling design you employ to collect the data for your study, be careful to avoid selection bias. Selection bias occurs when some experimental units in the population have less chance of being included in the sample than others. This results in samples that are not representative of the population. Consider an opinion poll on whether a device to prevent cell phone use while driving should be installed in all cars. Suppose the poll employs either a telephone survey or mail survey. After collecting a random sample of phone numbers or mailing addresses, each person in the sample is contacted via telephone or the mail and a survey conducted. Unfortunately, these types of surveys often suffer from selection bias due to nonresponse. Some individuals may not be home when the phone rings, or others may refuse to answer the questions or mail back the questionnaire. As a consequence, no data is obtained for the nonrespondents in the sample. If the nonrespondents and respondents differ greatly on an issue, then nonresponse bias exits. For example, those who choose to answer the question on cell phone usage while driving may have a vested interest in the outcome of the survey—say, parents of teenagers with cell phones, or employees of a company that produces cell phones. Others with no vested interest may have an opinion on the issue but might not take the time to respond. Finally, we caution that you may encounter a biased sample that was intentional, with the sole purpose of misleading the public. Such a researcher would be guilty of unethical statistical practice.
Assessing Human Risk
Published in Gary S. Moore, Kathleen A. Bell, Living with the Earth, 2018
Gary S. Moore, Kathleen A. Bell
Epidemiology involves the study of humans. Consequently, it avoids the extrapolation problem seen in animal studies. However, it is not without drawbacks. A significant problem in epidemiology is bias. Bias is a type of error introduced into epidemiological studies through differential treatment of the cases and controls in a study, either during the selection process or information gathering. Selection bias, information bias, and non-differential misclassification bias may lead to inaccurate study results. Researchers can address the problem of bias through careful study design. Despite this and other limitations, epidemiology remains an important tool for evaluating the link between exposure and disease.26,27
Inference from probability and nonprobability samples
Published in Uwe Engel, Anabel Quan-Haase, Sunny Xun Liu, Lars Lyberg, Handbook of Computational Social Science, Volume 2, 2021
Rebecca Andridge, Richard Valliant
Selection bias occurs if the seen part of the population (the sample) differs from the unseen (the nonsample) in such a way that the sample cannot be projected to the full population. Whether a nonprobability sample covers the desired population is a major concern. For example, in a volunteer web panel, only persons with access to the Internet can join a panel. To describe three components of survey coverage bias, Valliant and Dever (2011) defined three populations, illustrated in Figure 11.1: (1) the target population of interest for the study U; (2) the potentially covered population given the way that data are collected, Fpc; and (3) the actual covered population, Fc, the portion of the target population that is recruited for the study through the essential survey conditions. For example, consider an opt-in web survey for a smoking cessation study. The target population U may be defined as adults aged 18–29 who currently use cigarettes. The potentially covered population Fpc would be those study-eligible individuals with Internet access who visit the sites where study recruitment occurs; those actually covered Fc would be the subset of the potential covered population who participate in the study. Selecting a sample only from Fc results in selection bias. The sample s is those persons who are invited to participate in the survey and who actually do. The U − Fpc area in the figure are the many persons who have Internet access but never visit the recruiting websites or who do not have Internet access at all. In many situations, U − Fpc is vastly larger than either Fc or Fpc. Although we use a volunteer panel as an example, in other applications the decomposition in Figure 11.1 is still pertinent with appropriate definitions of the components.
A Worker’s Fitness-for-Duty Status Identification Based on Biosignals to Reduce Human Error in Nuclear Power Plants
Published in Nuclear Technology, 2020
First, grouping of subjects for experiments has limitations. They were categorized by self-evaluation. The use of self-evaluation to classify the subjects’ FFD status with respect to stress, depression, and anxiety leaves the issue of subjectivity. Kendall and Watson98 emphasized the limitations associated with the existing self-report scales for both anxiety and depression. A more objective way to select these groups needs to be considered in future work. The method of collecting biosignal data in this study was based on using subjects who were volunteers and college students that were all males. While the use of male college students reduces potential confounding effects, it has a potential selection bias and limits the applicability of the findings for the general population. The selection bias can be defined as an experimental error that occurs when the participant pool, or the subsequent data, is not representative of the target population. To address the representativeness issue of the subjects, our sample size was 114, which is relatively large. Based on post hoc power analysis,99 our collected subjects had 79.9% power. This power represents the ability of a trial to detect a difference between two different groups. This means that our sample size was large enough to represent the populations. But, the workers in NPPs are expected to be older. Thus, the possible effects of age on the findings may not be captured in the study. Using older people as part of the experimental subjects needs to be considered in future studies.
Exploring the diversity of creative prototyping in a global online learning environment
Published in International Journal of Design Creativity and Innovation, 2020
K. W. Jablokow, X. Zhu, J. V. Matson
Two possible selection biases exist due to our data collection process: (1) self-selection bias; and (2) attrition bias caused by loss of participants. First, as shown in Table 1, in terms of gender, country of origin, and occupational categories, the relative percentages of each subgroup are similar between our study sample and all CIC MOOC students who completed Coursera’s demographic survey, respectively. On the one hand, this partially proves that people of different genders, cultures, and occupations do not appear to choose this course and its exercises differently. On the other hand, this course focuses on learning about and improving one’s creativity; although the creative talents of the students may vary from person to person, they did not know the specific course content and tasks previously. Therefore, the propensity of students for participating in the study is not correlated with their resulting performance and personal reflections of beauty and creativity. From this point of view, self-selection will not generate a strong bias in this study. One possible caveat might apply if the primary reason students did not take this course is that they believed they were already highly creative (and so, do not need to learn about it). If they are correct and their creativity is significantly higher than the creativity of the students in our study sample, then self-selection bias would occur.
The representativeness and spatial bias of volunteered geographic information: a review
Published in Annals of GIS, 2018
Sample selection bias is also well studied in the machine learning community but under different names such as sample selection bias, covariates shift, and transfer learning (Cortes et al. 2008; Zadrozny 2004; Pan and Wang 2010). Sample selection bias arises where the underlying distributions from which the training and test data are drawn from are different. In other words, the distribution of the training data in feature space is different from the distribution of the test data. The approach to correcting for sample selection bias is importance weighting where, in learning classifiers (e.g. decision trees, support vector machines, logistic regression), training examples are weighted by an importance weighting function to compute the loss (Shimodaira 2000). Asymptotically, the optimal weighting function proves to be the ratio of the probability density function of features on the test data and the density function on the training data (Zadrozny 2004; Cortes et al. 2008). The weighting function is estimated based on empirical estimates of the two density functions.