Multivariate techniques in context
Pat Dugard, John Todman, Harry Staines in Approaching Multivariate Analysis, 2010
The power of a test is the probability that we do reject a false null hypothesis (so it is 1 – the probability of a Type II error). Whatever our chosen level of significance, we would like the power to be as high as possible. One approach is to set the required power for the smallest effect size that would be of practical interest. We might perhaps say, ‘If the means differ by at least 5 units, we want the power to be at least .8’. If we can specify the direction of the change (e.g., an increase in the average test score or a decrease in the time to undertake a task) then we perform a one-tailed test and, other things being equal, the power will be increased. We describe this and alternative approaches to estimating the effect size required, and show how, given the effect size, it is possible to decide on the sample size needed using a dedicated power analysis package such as SPSS SamplePower. It is also possible to use tables to decide on sample sizes: Clark-Carter (2009) provides an excellent introduction to these.
Endpoint Selection
Shein-Chung Chow in Innovative Statistics in Regulatory Science, 2019
In clinical trials, it is not uncommon that a study is powered based on expected absolute change from baseline of a primary study endpoint but the collected data are analyzed based on relative change from baseline (e.g., percent change from baseline) of the primary study endpoint, or the collected data are analyzed based on the percentage of patients who show some improvement (i.e., responder analysis). The definition of a responder could be based on either absolute change from baseline or relative change from baseline of the primary study endpoint. It is very controversial in terms of the interpretation of the analysis results, especially when a significant result is observed based on a study endpoint (e.g., absolute change from baseline, relative change from baseline, or the responder analysis) but not on the other study endpoint (e.g., absolute change from baseline, relative change from baseline, or responder analysis). Based on the numerical results of this study, it is evident that the power of the test can be decreased drastically when the study endpoint is changed. However, when switching from a study endpoint based on absolute difference to the one based on relative difference, one possible way to maintain the power level is to modify the corresponding non-inferiority margin, as suggested by the results given in Section 4.4.
Excess Success for a Study on Visual Search and Autism
Elizabeth B. Torres, Caroline Whyatt in Autism, 2017
The really curious finding is for the relatively small values of SDp. Many of the values between 0 and 1 lead to poor reproducibility (as low as 30%) of the findings in the original data set, and this dip is present regardless of sample size. The frequent failure to replicate occurs because the population means created for these SDp values tend to be different enough to produce experiments with moderate power for one or more of the tests. Which tests have moderate power varies with the sample size. For example, if the original data set with n = 10 generates a significant main effect, there is a decent chance that a replication study will not produce the same significant main effect. At the same time, an initial data set with such a small sample is unlikely to produce a significant interaction effect (the power is lower), and replication studies will likewise usually produce the same nonsignificant outcome. In contrast, for a larger sample size, the power for the test of a main effect is high, and both the original and replication studies are likely to show it. However, power for an interaction is lower, and if the original data set happens to generate a nonsignificant interaction, there is a decent chance that a replication study will produce a significant interaction, and vice versa.
Significance test for linear regression: how to test without P-values?
Published in Journal of Applied Statistics, 2021
Paravee Maneejuk, Woraphon Yamaka
Considering the case that the null hypothesis 4–6, there is a small variation in the probability values for all methods. When the sample size is greater than 10, all methods show the evidence supporting the alternative hypothesis. However, in the case of small sample size, say N = 10, there is a number of times that our testing methods lead to misinterpretation. Among 1,000 simulated datasets, we can see that p-value favors p-value and plausibility approaches, except for N = 10. This indicates that the power of any test depends on the sample size. If the sample size is large enough, the test will be more reliable, especially when the null hypothesis
Nomogram prediction of vulnerable periodontal condition before orthodontic treatment in the anterior teeth of Chinese patients with skeletal Class III malocclusion
Published in Acta Odontologica Scandinavica, 2021
Jian Jiao, Wu-Di Jing, Jian-Xia Hou, Xiao-Tong Li, Xiao-Xia Wang, Xiao Xu, Ming-Xin Mao, Li Xu
This study also had limitations. One limitation relates to its retrospective design, which is associated with a risk of selection bias. The prevalence of thin periodontal phenotype and alveolar fenestration/dehiscence may be overestimated because the patients were candidates for periodontal surgery for soft and hard tissue augmentation. However, this potential bias did not affect the accuracy of the nomogram models because its effect was minimized by multivariate regression. In addition, sample size calculation was not carried out before enrolment. Therefore, a power simulation model was performed after the statistical analyses to evaluate the power of the test statistics; the results showed that the sample size was sufficient to reach a conclusion. Only patients with skeletal Class III malocclusion were included. Other parameters, e.g. cephalometric parameters, may be associated with GT/periodontal phenotype and alveolar dehiscence/fenestration. Therefore, further studies involving patients with other types of malocclusion or healthy individuals included and more parameters are needed to assess the generalizability of the models and improve their accuracy.
Exploring attitudes about evidence-based practice among speech-language pathologists: A survey of Japan and Malaysia
Published in International Journal of Speech-Language Pathology, 2021
Shin Ying Chu, Yuki Hara, Chiew Hock Wong, Mari Higashikawa, Grace E. McConnell, Annette Lim
Data were scored and descriptive statistics were performed to examine the frequency of responses for categorical variables (N and %). The assumption of normality was calculated based on the Shapiro–Wilk test. The non-significant Shapiro–Wilk score (p > 0.05), an independent t-test, and a one-way ANOVA test were performed. Meanwhile, the Kruskal–Wallis Test and Mann–Whitney test were employed for a significant Shapiro–Wilk score (<0.05). In the one-way ANOVA test, a Tukey post-hoc test was used to determine whether any given demographics data was associated with the SLPs’ perception and knowledge. The total score of each section was used for the computation of bivariate Pearson Correlations to address the correlation between sections and between demographic data and sections. Based on the G*Power, a t-test power analyses revealed that a sample size of 84 (42 from each sites) are needed to detect meaningful differences, with a large effect size of 0.80 (Faul, Erdfelder, Buchner, & Lang, 2009). For ANOVA power analyses, a total sample size of 97 is required for a large effect size of 0.45 (G*Power, Faul et al., 2009). All data were analysed using the Social Packages for the Social Sciences (SPSS) version 24.0 for windows.
Related Knowledge Centers
- NONparametric Statistics
- Null Hypothesis
- Sensitivity & Specificity
- Statistical Hypothesis Testing
- Type I & Type II Errors
- Alternative Hypothesis
- False Positives & False Negatives
- Type I & Type II Errors
- Sensitivity & Specificity
- Sample Size Determination
- Effect Size
- Parametric Statistics