Explore chapters and articles related to this topic
Hierarchical Models and Longitudinal Data
Published in Gary L. Rosner, Purushottam W. Laud, Wesley O. Johnson, Bayesian Thinking in Biostatistics, 2021
Gary L. Rosner, Purushottam W. Laud, Wesley O. Johnson
This chapter considers models that have additional complexity beyond those that we have already considered. In all the chapters that considered regression modeling, there was a presumption that important predictor variables were taken into account. Indeed, when designing a study to assess the importance of various predictor variables on a response variable of interest, scientists will do their best to include those variables that they deem important and they are able to measure. There will be situations, however, where this has been done and where it is also suggested that there may be some other factors that (i) were not taken into account and perhaps should have been, (ii) are difficult to measure and are consequently left out of the model, or (iii) are perhaps even impossible to measure. With a slight abuse of the term, we refer to such variables as latent. Strictly, the word “latent” implies that such variables are unobservable. Here we extend this interpretation to mean that they are unobservable or unobserved (perhaps due to some form of scientific oversight). This chapter focuses on extending previous models to allow for latent variables.
Statistics for Genomics
Published in Altuna Akalin, Computational Genomics with R, 2020
If the variances are the same, the ratio will be 1, and when H0 is true, then it can be shown that expected value of will be σ2, which is estimated by the RSE. So, if the variances are significantly different, the ratio will need to be significantly bigger than 1. If the ratio is large enough we can reject the null hypothesis. To assess that, we need to use software or look up the tables for F statistics with calculated parameters. In R, function qf() can be used to calculate critical value of the ratio. Benefit of the F-test over looking at significance of coefficients one by one is that we circumvent multiple testing problem. If there are lots of explanatory variables at least 5% of the time (assuming we use 0.05 as P-value significance cutoff), p-values from coefficient t-tests will be wrong. In summary, F-test is a better choice for testing if there is any association between the explanatory variables and the response variable.
Endpoint Selection
Published in Shein-Chung Chow, Innovative Statistics in Regulatory Science, 2019
In clinical trials, for a given primary response variable, commonly considered study endpoints include: (i) measurements based on absolute change (e.g., endpoint change from baseline); (ii) measurements based on relative change; (iii) proportion of responders based on absolute change; and (iv) proportion of responders based on relative change. We will refer these study endpoints to as the derived study endpoints because they are derived from the original data collected from the same patient population. In practice, it will be more complicated if the intended trial is to establish non-inferiority of a test treatment to an active control (reference) treatment. In this case, sample size calculation will also depend on the size of the non-inferiority margin, which may be based on either absolute change or relative change of the derived study endpoint. For example, based on responder’s analysis, we may want to detect a 30% difference in response rate or to detect a 50% relative improvement in response rate. Thus, in addition to the four types of derived study endpoints, there are also two different ways to define a non-inferiority margin. Thus, there are many possible clinical strategies with different combinations of the derived study endpoint and the selection of non-inferiority margin for assessment of the treatment effect. These clinical strategies are summarized in Table 4.3.
A random effect regression based on the odd log-logistic generalized inverse Gaussian distribution
Published in Journal of Applied Statistics, 2023
J. C. S. Vasconcelos, G. M. Cordeiro, E. M. M. Ortega, G. O. Silva
Many studies in the fields of public health, economics, agronomy, medicine, biology and the social sciences, among others, involve repeated observations of a response variable. The expression ‘repeated measures’ is used to designate measures obtained for the same variable or in the same experimental unit on more than one occasion; see [2,3]. Various experimental designs with repeated measures are common, such as split-plot, crossover and longitudinal. These types of investigations are referred to as correlated data studies, and they play a fundamental role in the analysis of results where it is possible to characterize alterations in the characteristics of an individual by associating these variations with a set of covariables. Due to their nature the repeated measures have a correlation structure that plays an important role in the analysis of these types of data. Besides, the distribution of the response variable can present asymmetry or bimodality.
A new alternative quantile regression model for the bounded response with educational measurements applications of OECD countries
Published in Journal of Applied Statistics, 2023
Mustafa Ç. Korkmaz, Christophe Chesneau, Zehra Sedef Korkmaz
In view of the foregoing, the purpose of this paper is to suggest a new unit distribution as well as quantile regression modeling for it. For scenarios when the response variable is described as rates or proportions, i.e. on the support 15]). ‘Almost’ because, to our knowledge, in this framework, only the arsech normal distribution by Korkmaz et al. [29] and the logitSHASHo distribution by Nakamura et al. [40] use such a hyperbolic transformation approach. Here, we motivate the fact that the considered hyperbolic transformation is able to carry the applicability of the Weibull distribution to the unit interval. Hence, we confer into the different probability density function (pdf) and hazard rate function (hrf) characteristics that the Weibull distribution does not possess over the unit interval. The motivations of the study also includes a new quantile regression modeling since the quantile function (qf) of the proposed distribution can be obtained with a closed form. Thus, its pdf and cumulative distribution function (cdf) can be easily re-parameterized in terms of any of its quantiles. This re-parameterization is applied like those of Refs. [6,36,39]. It is also to show its modeling ability with its applications based on the proportion of the educational measurements.
A robust and efficient variable selection method for linear regression
Published in Journal of Applied Statistics, 2022
Zhuoran Yang, Liya Fu, You-Gan Wang, Zhixiong Dong, Yunlu Jiang
In this section, we analyze the air pollution data collected on 60 U.S. Standard Metropolitan Statistical Areas (SMSA's), which was analyzed by Gijbels and Vrinssen [6]. The dataset includes 14 covariates: mean January temperature (JanT), mean July temperature (JulT), relative humidity (RH), annual rainfall (Rain), median education (Edu), population density (PD), percentage of non whites (%NW), percentage of white collar workers (%WC), population (Pop), population per household (P/H), median income (Income), hydrocarbon pollution potential (HP), nitrous oxide pollution potential (NP), and sulfur dioxide pollution potential(SP). The response variable is age-adjusted mortality. Note that we assume that the data are missing completely at random, and thus we remove the 21st observation as a result of its two missing values. There is high skewness of HP, NP, and SP so that a logarithm transformation has been taken for them. Such transformed covariates are denoted by LogHP, LogNP, and LogSP, respectively.