Patient-Reported Outcomes: Development and Validation
Demissie Alemayehu, Joseph C. Cappelleri, Birol Emir, Kelly H. Zou in Statistical Topics in Health Economics and Outcomes Research, 2017
Classical test theory (CCT) is a traditional quantitative approach to testing the reliability and validity of a scale based on its items, and is the basis for all of the psychometric methods described in this chapter (except for the person-item maps discussed in Section 2.6). In the context of PRO measures, CCT assumes that each observed score (X) on a PRO instrument is a combination of an underlying true score (T) on the concept of interest and unsystematic (i.e., random) error (E). CTT assumes that each person has a true score that would be obtained if there were no errors in measurement. A person’s true score is defined as the expected score over an infinite number of independent administrations of the scale. Scale users never observe a person’s true score, only an observed score. It is assumed that observed score (X) = true score (T) plus some error (E).
Practical Considerations for Interpreting Change Following Brain Injury
Mark R. Lovell, Ruben J. Echemendia, Jeffrey T. Barth, Michael W. Collins in Traumatic Brain Injury in Sports, 2020
Measurement error is closely related to test reliability. Reliability relates to the consistency or stability in test scores. According to classical test theory, it has been viewed in terms of the relationship between “true” scores and obtained scores. Obtained scores are believed to contain an error component, which influences the consistency or stability of a particular score. Thus, reliability may be viewed as the ability of an instrument to reflect an individual score that is minimally influenced by error. Reliability should not be considered a dichotomous concept; rather it falls on a continuum. One cannot say an instrument is reliable or unreliable, but more accurately should say it possesses a high or low degree of reliability for a specific purpose, with a specific population (Franzen, 1989, 2000).
Assessment
Jane Doe in Teaching Made Easy, 2017
This subject of psychometrics is well beyond the scope of this book. However, you should be aware that requests for reliability to be demonstrated in terms of statistical measurements are common.9 For any assessment, there is true score and error. Classical test theory groups all of the errors together, and provides a measure of the levels of internal consistency (such as Cronbach's alpha). A value of 0.8 or above is regarded as acceptable for a high-stakes assessment. Generalisability theory has developed this concept further, and separates out the different sources of error (such as that of the candidates, the examiners and the different questions). Again, a generalisability coefficient of 0.8 or above is regarded as acceptable. In addition, generalisability theory can predict what the reliability would be if we increased or decreased the numbers of questions or examiners in an assessment. For example, a five-station OSCE with a low reliability of only 0.5 can be improved by increasing the number of stations to 20, which would improve the reliability to 0.8.
A Rasch analysis of the lumbar spine instability questionnaire
Published in Physiotherapy Theory and Practice, 2021
Luciana Gazzi Macedo, Ayse Kuspinar, Mary Roduta Roberts, Chris G. Maher
Classical test theory (CTT) has been the common measurement theory used to direct the development and evaluation of outcome measures and questionnaires. However, there are important limitations associated with CTT. For example, scores produced by the scale are ordinal rather than continuous, scores for persons and samples are scale dependent, and measurement properties (e.g. reliability and validity) are sample dependent. In other words, interpretation of scores within CTT cannot generalize beyond the characteristics of the sample from which it was derived. However, Rasch Measurement Theory (RMT) offers several advantages over CTT. For example, one is able to construct linear measurements from ordinal-level data, item estimates are free from the sample distribution and person estimates are free from scale distribution. Furthermore, RMT provides information on Differential Item Functioning, and the spread of response categories across a linear continuum (Baylor et al., 2011; Hambleton and Jones, 1993). The purpose of this study was to continue validation of the LSI questionnaire by evaluating its psychometric properties. Although a study using Rasch analysis was recently published (Saragiotto et al., 2018), it is important to validate their results using a different and larger sample of patients, with a more homogeneous clinical presentation such as chronic back pain. Thus, the objective of this study was to evaluate whether the LSI questionnaire matched the theoretically expected pattern of the Rasch model (e.g. unidimensionality) and to assess whether responses were independent of age, gender, pain and function.
Identification and Evaluation of Items for Vitreoretinal Diseases Quality of Life Item Banks
Published in Ophthalmic Epidemiology, 2019
Mallika Prem Senthil, Eva K Fenwick, Ecosse Lamoureux, Jyoti Khadka, Konrad Pesudovs
The impact of major blinding retinal diseases (age-related macular degeneration and diabetic retinopathy) on quality of life (QoL) has been extensively studied.1–7 However, the impact of other retinal and vitreoretinal diseases (e.g. hereditary degenerations and dystrophies, vascular occlusions, macular hole, epiretinal membrane, and other vitreoretinopathies) on peoples’ QoL is poorly understood due to lack of appropriate patient reported outcome (PRO) instruments. There are currently 29 PRO instruments available for retinal diseases, 17 of which were developed for other retinal and vitreoretinal diseases.8 Of the 17 PRO instruments, 11 studies relate to hereditary retinal disorders (nine to retinitis pigmentosa, one to congenital stationary night blindness, one to Stargardt’s macular dystrophy), three relate to macular hole, and one relates to cytomegalovirus retinitis.9–23 These PRO instruments were mostly developed using traditional methods of psychometric assessment (i.e. Classical Test Theory). The Classical Test Theory assumes that the value of each item on the questionnaire has the same difficulty level and therefore scores them equally. Also, the ordinal integer response used for each item assumes equal separation and uniform changes between the response categories.24 Both these assumptions damage the ability of the Classical Test Theory scored instruments to measure precisely and accurately.25
Detecting mental health problems after paediatric acquired brain injury: A pilot Rasch analysis of the strengths and difficulties questionnaire
Published in Neuropsychological Rehabilitation, 2021
Robyn Henrietta McCarron, Fergus Gracey, Andrew Bateman
Traditionally, the psychometric properties of assessment measures in terms of validity and reliability have been investigated using methods based on Classical Test Theory (CTT). The Rasch Measurement Model (Rasch, 1960) is a modern psychometric technique that falls within the parameters of Item Response Theory (IRT) (Hambleton et al., 1991). Unlike CTT the Rasch model has the advantage of not assuming the equivalence between ordinal and interval scales (Hobart & Cano, 2009). Nor does it rely on the assumption that the observed scores are composed of the true score and an error (neither of which can be determined), in order to estimate the reliability of the observed score. Instead, the Rasch model is based on assumptions that readily make sense within a real-world context. It tests the assumptions that people respond in a probabilistic but ordered manner based on both their underlying traits (be it ability or disease severity) and the level or difficulty being assessed by an item/question. It maintains that an assessment measure should not be biased towards individuals with certain characteristics or previous responses, and it argues that for a total score to be meaningful it needs to be reflective of a single unidimensional construct. Rasch analysis has been demonstrated to be an insightful method for examining the psychometric properties of rating scales in different populations, including in people with ABI (Bateman et al., 2009; Simblett et al., 2015).
Related Knowledge Centers
- Psychometrics
- Reliability
- Item Response Theory
- Kuder–Richardson Formulas
- Cronbach'S Alpha
- Item Analysis
- Multiple Choice
- Psychometric Software
- Educational Psychology
- Standardized Test