Interrater reliability – Knowledge and References

Explore chapters and articles related to this topic

Monitoring of Quality

Published in A.F. Al-Assaf, Managed Care Quality, 2020

Finally, the process of pilot testing allows for the initial measurement of inter-rater reliability. Inter-rater reliability is a measure of agreement on data collected from the same records by two or more independent abstractors. During the initial phases of testing the indicator, “gold standard” case review can be utilized in the training and reliability testing of independent record abstractors. Typically, a 5% random sample of cases are selected for rereview by a second abstractor and the results of abstraction are compared for the two reviewers. Data items with high degrees of inconsistency between reviewers may need to be modified, abandoned, or redefined to improve the reliability of data collection.

Designing and Implementing Research on the Development Editions of the Test

View Chapter

Purchase Book

Published in Lucy Jane Miller, Developing Norm-Referenced Standardized Tests, 2020

Jan Gwyer

Interrater reliability is defined as the amount of or degree of agreement between observers.13 A test should be consistent so that two or more examiners using the test to assess the same behaviors will reach the same conclusions. Factors that affect interrater reliability include the clarity of the concepts being measured, the level of measurement used in the items, and the complexity of the testing procedures and scoring instructions.

Medical Error as a Collaborative Learning Tool

View Chapter

Purchase Book

Published in Fritz Allhoff, Sandra L. Borden, Ethics and Error in Medicine, 2019

Jordan Joseph Wadden

With these three types of errors outlined, it would be beneficial to consider the extent to which multiple people might achieve the same result from similar evidence pools. One way this can happen is when there is a problem with inter-rater reliability, or the likelihood that two (or more) agents will come to the same decision given identical sets of evidence. While there are several kinds of inter-rater reliability tests, I will be focusing on Cohen’s kappa and Fleiss’s kappa which consider the chance that the individuals will agree by chance.9 Jacob Stegenga uses the following toy example to demonstrate how a kappa score shows inter-rater reliability. Two teaching assistants are grading the same class of 100 students where their only decision is to pass or fail the student. Beth passes half the class, while Sara passes 60 students. Independently, they managed to agree to pass 40 and fail 30. Through calculating the kappa with this information, there is only a 40 percent chance Sara and Beth will agree on whether any particular student should pass or fail (Stegenga 2018, 101).

Epinephrine treatment of food-induced and other cause anaphylaxis in United States and Canadian Emergency Departments: a systematic review and meta-analysis

View Article

Journal Information

Published in Expert Review of Clinical Immunology, 2023

Geneva D. Mehta, Joumane El Zein, Isis Felippe Baroni, Myrha Qadir, Carol Mita, Rebecca E. Cash, Carlos A. Camargo

Interrater reliability was assessed using Cohen’s kappa statistic. For studies that collected data over multiple years, the median year of the study period was used for data analysis. To assess for change in epinephrine treatment of anaphylaxis over time, we first examined the data qualitatively using scatter plots (x-axis = median year, y-axis = percent of patients with anaphylaxis treated with epinephrine in the ED). If there were sufficient number of studies, we used Spearman correlation and meta-analysis stratified by time period to quantitatively assess the relationship. We performed meta-analysis using a user-written Stata command called metaprop to calculate pooled proportions and 95% CI overall and for two time periods, 2013–2022 (last 10 years) and prior to 2013 [21,22]. Pooled proportions are presented as percentages for clarity. Heterogeneity was determined by I2 values. Given concern for bias, we excluded from the Spearman correlation and meta-analysis studies with overlapping cohorts or where the number treated with epinephrine was not stated. For overlapping cohorts, we selected those with shorter time frames, that included multiple timeframes within the study, or that increased total number of studies in the meta-analysis, and excluded the others. Secondary outcomes were not meta-analyzed due to concern for incomplete capture of relevant literature as our systematic search was optimized for capture of our primary outcome. Analyses were performed using Stata 15.1 (Stata Corp, College Station, TX, U.S.A).

Reliability and validity of the public transportation use assessment form for individuals after stroke

View Article

Journal Information

Published in Disability and Rehabilitation, 2023

Shin Kitamura, Yohei Otaka, Kazuki Ushizawa, Seigo Inoue, Sachiko Sakata, Kunitsugu Kondo, Masahiko Mukaino, Eiji Shimizu

However, few established assessment tools are focused on the use of public transportation by individuals after stroke. Therefore, we developed the public transportation use assessment form (PTAF), a tool consists of four categories and 15 subtasks necessary for individuals post-stroke to use public transportation and assesses the degree of independence of each subtask at three levels [15]. Before using the PTAF in clinical settings and making an intervention program, the reliability and validity of the assessment form need to be examined [16]. For example, confirmed inter-rater reliability will allow reliable assessment of patient performance without depending on the rater. Moreover, demonstrated validity will verify that the assessment results using the PTAF can appropriately reflect the performance of individuals post-stroke when using public transportation. Therefore, confirming the reliability and validity will facilitate the interpretation of the results of the PTAF assessment in clinical practice and contribute to considering interventions based on more accurate results. In a previous study, we confirmed the content validity of the PTAF, but its reliability and other validities were not verified [15]. The international standard for scale development, consensus-based standards for the selection of health measurement instruments (COSMIN), provides nine measurement properties [17], of which PTAF has only validated one [15]. The purpose of this study is to examine the inter-rater reliability, construct validity, and internal consistency of the PTAF in clinical use for individuals after stroke.

Inter-rater agreement when linking stroke interventions to the extended international classification of functioning, disability and health core set for stroke

View Article

Journal Information

Published in Disability and Rehabilitation, 2022

Melissa Evans, Catherine Sykes, Clare Hocking, Richard Siegert, Nick Garratt

The study reported here involved two independent linkers, who used the linking rules developed by Cieza et al. [1] to extract interventions from 10 digital patient records provided by community stroke rehabilitation services. The linkers used their knowledge of the ICF and interventions for stroke to link the target function of the intervention to one or more of the 166 categories in the Extended International Core Set for Stroke (EICSS). The EICSS is validated by physicians, occupational therapists and physiotherapists, and represents the problems that are most commonly addressed by these professions [9–11]. The primary aim of this research was to investigate the inter-rater agreement between two independent linkers when extracting interventions from patient digital records, and when linking the target of the intervention to an ICF code. The secondary aims were to analyse factors that reduce inter-rater reliability; and make recommendations to improve inter-rater reliability in similar studies.