Kappa statistics – Knowledge and References

Explore chapters and articles related to this topic

Critical appraisal of systematic reviews

Published in O. Ajetunmobi, Making Sense of Critical Appraisal, 2021

Usually, two or three assessors would independently pass judgements on the eligibility of studies to be included in a systematic review. The report of a systematic review should state the levels of inter-rater agreement on all the studies considered. Details should also be provided about how disagreements were handled. (See ‘Inter-rater reliability’ and ‘Kappa statistic’ in the Glossary of terms at the end of the book.)

Stage 2: Measuring performance

View Chapter

Purchase Book

Published in Robin Burgess, New Principles of Best Practice in Clinical Audit, 2020

Stephen Ashmore, Tracy Ruthven, Louise Hazelwood

Before starting an audit, the reliability of data collection should be checked by asking data collectors to independently extract data from the same sample of records and then compare their findings. The percentage of items that are the same, or the kappa statistic, is calculated in order to estimate inter-rater reliability.8 If reliability is low, the data collection procedures must be reviewed.

How Will the Data be Analyzed?

View Chapter

Purchase Book

Published in Trena M. Paulus, Alyssa Friend Wise, Looking for Insight, Transformation, and Learning in Online Talk, 2019

Robyn Singleton

Cohen’s kappa (κ, Cohen, 1960) is relatively straightforward to calculate and produces an accurate estimate of reliability when rating data is complete and rater error is evenly distributed. While the original kappa statistic is calculated for two raters and one coding category, various adapted versions that allow for additional raters and categories have been developed. Kappa values range from 0 to 1 with a higher kappa indicating higher inter-rater reliability. There is no generally accepted threshold value for an acceptable level of kappa, though benchmarks of 0.61–0.80 as “good” and 0.81–1.00 as “very good” have been suggested (Landis & Koch, 1977; Altman, 1991). Alternatively, Fleiss, Levin, and Paik (2003) describe kappa of 0.40–0.75 as “intermediate to good” and greater than 0.75 as “excellent.” A good rule of thumb is that κ> 0.70 is reasonable for supporting inferences about the content of online talk, results based on 0.60 < κ<.70 should be interpreted with caution, and content analysis yielding κ<.60 should not be interpreted.

Does bacterial colonization influence ureteral stent-associated morbidity? A prospective study

View Article

Journal Information

Published in Arab Journal of Urology, 2023

Mohamed Samir, Mahmoud Ahmed Mahmoud, Ahmed Tawfick

Kolmogorov–Smirnov’s test was used to assess the normal distribution of continuous variables. All results were presented as mean and SD values or as median and interquartile range according to the data distribution. Categorical results were presented as numbers of cases and percentages. Continuous variables were compared using student’s t-test or the Mann–Whitney U-test, according to the data distribution. Categorical variables were compared using chi-square test, Fisher’s exact test or Monte Carlo test depending on the data. Kappa statistics was used to compute the measure of agreement between two investigational methods: a Kappa score of over 0.75 is excellent; a score 0.40 to 0.75 is fair to good; and below 0.40 is poor. All statistical calculations were done using SPSS version 20 for Windows (SPSS Inc, Chicago, IL, USA).

Applying recommended definition of aggressive prostate cancer: a validation study using high-quality data from the Cancer Registry of Norway

View Article

Journal Information

Published in Acta Oncologica, 2023

T. E. Robsahm, K. M. Tsuruda, H. H. Hektoen, A. H. Storås, M. B. Cook, L. M. Hurwitz, H. Langseth

Overall, Approach 1 classified 24.7% of all PC cases as aggressive and 13.6% were unspecified. A lower proportion (19.6%) were classified as aggressive using Approach 2, but 29.0% were unspecified. This highlights the benefit of using detailed information on cTNM and Gleason score to define aggressive PC (Approach 1). We observed moderate agreement between Approaches 1 and 2 using Landis and Koch’s qualitative guideline for the Kappa statistic [21]. Although this guideline is commonly used in medical research, the Kappa statistic is affected by the marginal rates for both Approaches which can limit the utility of using a pre-defined scale to qualify the level of observed agreement [26]. Further research is needed to identifywhether the aggressive prostate cancers identified by both approaches have similar aetiologies, but our results for sensitivity and PPV indicate that cTNM and Gleason score (Approach 1) predict aggressiveness better than CRN stage at diagnosis only (Approach 2). However, when such information is lacking, as is the case in Norway prior to the establishment of NPCR in 2004, Approach 2 may provide a useful alternative to classify aggressiveness. We observed that the sensitivity of both approaches improved slightly over time, but this trend was most pronounced for Approach 2. The PPVs decreased between 2005 and 2007, likely due to a slight increase in the number of aggressive cancers and a concomitant decrease in the number of fatal cancers. It may also partially be due to the increase in clinical data completeness over time.

Association between recorded medication reviews in primary care and adequate drug treatment management – a cross-sectional study

View Article

Journal Information

Published in Scandinavian Journal of Primary Health Care, 2021

Naldy Parodi López, Staffan A. Svensson, Susanna M. Wallerstedt

The inter-rater agreement regarding the overall medical assessment was evaluated with kappa statistics. Logistic regression was performed to obtain crude and adjusted odds ratios with 95% confidence intervals for the association between patient characteristics and ≥1 recorded medication review over the last year. In accordance with Swedish regulations and the local remuneration policy, age and number of drugs were dichotomized in these models (<75 versus ≥75 years of age; fewer than five versus five or more regular drugs in the medication list). Other patient characteristics included in the models were sex (female versus male), nursing home residence (yes versus no), and multi-dose drug dispensing (yes versus no). We also included morbidities where the prevalence differed significantly between the comparison groups in the univariate analyses. To further explore the association between patient conditions and recorded medication reviews, these morbidities were included both in a separate model and in a model where all patient characteristics were considered. The independent variables used in the logistic regression models were checked for correlations, that is, multicollinearity, using tolerance levels.