Explore chapters and articles related to this topic
Addressing the Utilization of Popular Regression Models in Business Applications
Published in K. Hemachandran, Sayantan Khanra, Raul V. Rodriguez, Juan R. Jaramillo, Machine Learning for Business Analytics, 2023
Meganathan Kumar Satheesh, Korupalli V. Rajesh Kumar
Logistic regression, which is an extension of logit transformation (Omay, 2010), is different from linear regression in terms of dependent variable outcome (Hosmer Jr et al., 2013; Omay, 2010) that is either in binary form or dichotomous form(Bolton, 2009). If the regression is linear regression, the dependent variable outcome will be in continuous form. On the other hand, logistic regression will have a discrete or categorical outcome for the dependent variable (Hosmer Jr et al., 2013; Omay, 2010). Binomial logistic regression is used in the case of two categorical outcomes for a dependent variable, whereas more than two categorical outcomes for a dependent variable can be handled by multi-nomial logistic regression (Bayaga, 2010). Here, there is no need for the independent variables to be in the normal distribution and for both variables to be in a linear relationship, paving the way for more flexibility when compared to multiple linear regression (Bayaga, 2010; Omay, 2010). An ordinal logistic regression model is preferred when the dependent variable is in the ordinal form (rank order), which does not help deal with multi-collinearity between independent variables (Larasati et al., 2011).
Machine Learning Classifiers
Published in Rashmi Agrawal, Marcin Paprzycki, Neha Gupta, Big Data, IoT, and Machine Learning, 2020
Logistic regression, or logistic model or logit model, examines the relationship between a set of predictor variables and a categorical response variable, and determines the probability of occurrence of an event by modeling the response in terms of predictors using a logistic or sigmoid curve (DeGregory, Kuiper et al. 2018). Logistic regression models are binary logistic regression and multinomial logistic regression depending on whether the dependent variable is binary or not. If the dependent variable is binary, having two values, true or false, and independent variables are either continuous or categorical, binary logistic regression is applied. Multinomial logistic regression is applied when the response variable has more than two categorical values. The relationship between independent and dependent variables is represented as: Y=b0+b1X1+b2X2+………bnXn
Analyzing Variability
Published in Erick C. Jones, Supply Chain Engineering and Logistics Handbook, 2020
Logistic regression is a technique for modeling and analyzing the y = f(x) relationship when the y is attribute (usually binary, but multiple ordinal responses are also acceptable) and the x(s) are continuous. Logistic regression relates single dependent variable to more independent variable. It is similar to that of regular linear regression since it has regression coefficients, residuals, and predicted value. In case of linear regression, it is assumed that response variable is continuous, whereas in case of logistic regression, response variable is continuous. Logistic regression is not used as often as ANOVA or regression, as the emphasis in Six Sigma is to find y(s) that are continuous variable. Since the response is generally binary, a logistic transformation is performed to prevent the possibility of negative probabilities as outcomes. The statistical conclusions are drawn from logistic regression. P-values are no different than when using the other analytical methods. Ordinary least squares approach is used to determine the regression coefficient for linear regression, and maximum-likelihood estimation is used to determine the regression coefficient for logistic regression.
Influence of residential indoor environment on self-rated health in China: a cross-sectional study
Published in Science and Technology for the Built Environment, 2022
The characteristics of the respondents were described as percentages, average values, and range (minimum–maximum). Associations between the indoor environmental components and self-reported health were analyzed by the Pearson’s correlation coefficient. Moreover, relations between these variables were tested using the t-test due to the normal distribution (Kolmogorov-Smirnov test, P value > 0.05). To analyze the relationship between overall indoor environment quality and health, the indoor environment quality score was equally divided into 10 groups. For each group, the average subjective health score and percentages of respondents without specific symptoms and diseases were calculated. The differences between groups were examined by the t-test due to the normal distribution (Kolmogorov-Smirnov test, P value > 0.05). Furthermore, to examine whether individual characteristics may influence the interaction between indoor environment quality and health, the multivariate logical regression analysis were used. The goodness of fit for logistic regression models were tested using Hosmer–Lemeshow test (P > 0.05). Collinearity among the independent variables in the model was examined based on the variance inflation factor (VIF < 5) (Rencher and Christensen 2012). The odds ratio (OR) or adjusted odds ratio (AOR) with their 95% confidence interval (CI) were summarized.
Impact of the pre-simulation process of occupant behaviour modelling for residential energy demand simulations
Published in Journal of Building Performance Simulation, 2022
Yuanmeng Li, Yohei Yamaguchi, Yoshiyuki Shimoda
The weakness of MSD is that it does not quantify the goodness-of-fit with the observations. To overcome this weakness, we considered a second indicator based on the Hosmer–Lemeshow test often used to evaluate the goodness-of-fit in logistic regression models. In the test, the samples were divided into several groups after sorting the samples according to the estimated probability from the lowest to the highest. Then, the statistical difference in the probability of each group was tested between the estimation and observation. However, this method is not effective when the occurrence of the model objectives is low (Paul, Pennell, and Lemeshow 2013) and it is inapplicable to activities with a low starting probability. Therefore, we designed an indicator that measured the RMSE between the averaged estimated probability and averaged probability of observations of subgroups as in the Hosmer–Lemeshow test, named RMSE_GA and calculated as where indicates the subgroups created based on the estimated probability of the test set. For activity m at t, we sorted the test set according to the estimated probability and then equally divided it into 10 subgroups ( = 10 as often used in the Hosmer–Lemeshow test) for all methods. This indicator quantified the difference in the distribution of observation and estimation, thereby assessing inter-occupant diversity.
Logistic regression analysis for confidence level estimates of additive percentage at high service temperatures
Published in International Journal of Pavement Engineering, 2021
Abbas Babazadeh, Mohammad Jafari
In linear regression analysis, least squares method is used to estimate the coefficients of the models. In this method, the parameters are estimated in such a way that the total error squares between the observed and the predicted values of the response variable is minimised (Draper and Smith 2014). In logistic regression, the method used to estimate the parameters is maximum likelihood, which maximises the probability of obtaining the observed data set. In order to apply this method, a function of the unknown parameters named likelihood function is constructed, and the parameters are chosen to be those values that maximise this function (Hosmer et al. 2013). A brief review of estimating the parameters is given below. Further details may be found elsewhere (Myers et al. 2010, Hosmer et al. 2013).