Explore chapters and articles related to this topic
Linear Regression
Published in Gary L. Rosner, Purushottam W. Laud, Wesley O. Johnson, Bayesian Thinking in Biostatistics, 2021
Gary L. Rosner, Purushottam W. Laud, Wesley O. Johnson
To construct an informative prior for τ, we ask an expert to think about a percentile for the response values in the population of individuals corresponding to a particular predictor vector, say . To be specific, the quantile equals . We elicit a best guess for this quantile, conditional on the best guess for from the elicitation in Section 7.3.3. Setting this elicited value to leads to a best guess for σ, and subsequently to one for . While this can be used to specify a central value for the prior distribution, we need additional information to express uncertainty about the best guess for τ. We illustrate using the FEV data.
Working with continuous outcome variables
Published in Ewen Harrison, Pius Riinu, R for Health Data Science, 2020
Quantile-quantile sounds more complicated than it really is. It is a graphical method for comparing the distribution (think shape) of our own data to a theoretical distribution, such as the normal distribution. In this context, quantiles are just cut points which divide our data into bins each containing the same number of observations. For example, if we have the life expectancy for 100 countries, then quartiles (note the quar-) for life expectancy are the three ages which split the observations into 4 groups each containing 25 countries. A Q-Q plot simply plots the quantiles for our data against the theoretical quantiles for a particular distribution (the default shown below is the normal distribution). If our data follow that distribution (e.g., normal), then our data points fall on the theoretical straight line.
Perception, Planning, and Scoping, Problem Formulation, and Hazard Identification
Published in Ted W. Simon, Environmental Risk Assessment, 2019
There are a number of problems associated with the use of quantiles. Obviously, risk will vary between individuals within a given quantile to an unknown degree. On top of this is the ever-present potential for exposure misclassification. What happens in many cases is that the number of individuals in each quantile becomes so low that the study loses sufficient statistical power to justify its conclusions. The last problem with quantiles is that the cut points selected are most often based on the continuous variable and chosen for statistical convenience as opposed to biological relevance. Implicitly, individuals within a single quantile are assumed to homogenous and the choice of cut points has the potential to produce both false positives and false negatives.125
Association between participation in the Northern Finland Birth Cohorts and cardiometabolic disorders
Published in Annals of Medicine, 2023
Martta Kerkelä, Mika Gissler, Tanja Nordström, Olavi Ukkola, Juha Veijola
The cumulative incidence rates of cardiometabolic disorders in all hospital-treated cardiovascular disorders (including inpatient and specialized outpatient visits) were calculated for the study and comparison cohorts covering the full follow-up (age 7 to 50 years in NFBC1966; age 0 to 29 years in NFBC1986). Different types of diabetes mellitus were examined from 1987 onwards (age 2 to 29 years in NFBC1986 and age 22 to 50 years in NFBC1966). Due to the small number of cases of hyperlipidaemia and coronary artery disorders in the younger population (follow-up ends at age 29 years), the separate diagnosis classes are not included in the analysis. Risk ratios (RRs) with 95% confidence intervals (CIs) were calculated by sex separately in each diagnosis group. The age of the first onset of cardiometabolic diagnosis (median with IQR) is reported in each diagnosis group. The difference between the medians is estimated using quantile estimation (QE) and Q with p-values are reported [27]. The age of the first onset of cardiometabolic disorders was plotted over the full follow-up period in both NFBCs, separated by sex. Cumulative incidences of cardiometabolic-related causes of death were calculated to NFBC1966 and comparison cohorts at age 0 to 50 years. The age of death caused by any cardiometabolic disorders (median with IQR) by sex is also reported. Analysis was performed using R version 1.4.1106.
Distance-based outlier detection for high dimension, low sample size data
Published in Journal of Applied Statistics, 2019
Jeongyoun Ahn, Myung Hee Lee, Jung Ae Lee
A classical definition of an outlier is a data point that locates far away from the rest of the data mass [4]. How to effectively measure the outlyingness is often the most important consideration in the outlier detection problem. For traditional low-dimensional data, Mahalanobis distance [16] is commonly used for this purpose. For uncontaminated normally-distributed data, the estimated squared Mahalanobis distance approximately follows a chi-square distribution, assuming that the sample size is large enough so that the mean and covariance matrix are accurately estimated. A chi-square quantile–quantile (QQ) plot is then used to detect outliers as well as to check normality for low-dimensional data [10]. See [21] for an overview of recent developments on multivariate outlier detection methods.
From good to excellent: Improving clinical departments’ learning climate in residency training
Published in Medical Teacher, 2018
Milou E. W. M. Silkens, Saad Chahine, Kiki M. J. M. H. Lombarts, Onyebuchi A. Arah
The identified learning climate groups seem to distinguish themselves by their scores on the total set of D-RECT domains, rather than by a varying representation of individual domains. This implies that a department scoring low on one domain is likely to score low on all D-RECT domains, and thus will classify as a substandard performer. Similarly, a high score on one domain more likely means overall high learning climate scores and an excellent performer classification. Classification of departments into four performance categories seems above all an intuitive categorization of performance data. In research fields such as economics and econometrics, statistical methods like quantile regression are applied to performance data to obtain information on specific data points in the distribution, instead of solely relying on the conditional mean (Buchinsky 1994; Eide and Showalter 1998). The benefit of LPA is that, instead of using an outcome variable to classify data into previously designed categories, we could use the whole range of D-RECT domain scores to determine the statistically underlying structure of the data. The underlying idea is that individual quantiles might behave differently from each other and provide different information in analyses. Therefore, knowing that departments naturally split into four comprehensive and individual groups is meaningful for analyzes linking learning climate to educational and patient outcomes.