Explore chapters and articles related to this topic
Logistic Regression: The Workhorse of Response Modeling
Published in Bruce Ratner, Statistical and Machine-Learning Data Mining, 2017
The classic approach to include a categorical variable into the modeling process involves dummy variable coding. A categorical variable with k classes of qualitative information is uniquely equivalent to a set of k − 1 quantitative dummy variables. The set of dummy variables replaces the categorical variable in the modeling process. The dummy variable assumes values of 1 or 0 for the presence or absence, respectively, of the class values. The class left out is called the reference class. The reference class is the baseline for comparing the other classes when interpreting the effects of dummy variables on the response variable. The classic approach instructs that the complete set of k − 1 dummy variables is included in the model regardless of the number of dummy variables that are declared nonsignificant. This approach is problematic when the number of classes is large, which is typically the case in big data applications. By chance alone, as the number of class values increases, the probability of one or more dummy variables declared nonsignificant increases. To put all the dummy variables in the model effectively adds noise or unreliability to the model as nonsignificant variables are known to be noisy. Intuitively, a large set of inseparable dummy variables poses difficulty in model building in that they quickly “fill up” the model, not allowing room for other variables.
Regression
Published in Richard L. Shell, Ernest L. Hall, Handbook of Industrial Automation, 2000
In this section we show how qualitative variables can be introduced into a regression in order to test for differences between groups. This is done by defining the qualitative (group) variable in terms of dummy variables. Dummy variables are binary variables in that they can take only the values of 0 and 1. The number of dummy variables required in order to define a single qualitative variable is always one less than the number of groups. As illustrated below, if there are only two groups then a single dummy variable is required to define the groups, but if there are three groups then two dummy variables are required. For a three-group qualitative variable DUM1 = 1 for Group 1 only, DUM2 = 1 for Group 2 only and, for Group 3, DUM1 = DUM2 = 0. Note that a third dummy variable, set equal to 1 for Group 3 and 0 otherwise, is therefore redundant.
Prediction of Heart Disease Using Machine Learning
Published in Monika Mangla, Subhash K. Shinde, Vaishali Mehta, Nonita Sharma, Sachi Nandan Mohanty, Handbook of Research on Machine Learning, 2022
Subasish Mohapatra, Jijnasee Dash, Subhadarshini Mohanty, Arunima Hota
Dummy variables are created to deal with categorical values. In steps three and four we do training and testing of models. Model selection is the way toward joining information and earlier data to choose among a gathering of factual models. we have analyzed our data with different combinations. In step five, we look forward to improvement. One approach to improve our model is to decrease the number of highlights in your information grid by picking those with the most noteworthy prescient worth is fitting. The quantity of critical highlights is not exactly the all outnumber of highlights, so the irrelevant highlights are dispensed with.
Parameterized environmental impacts of ready-mixed concrete in Spain
Published in Journal of Sustainable Cement-Based Materials, 2023
Núria Sánchez-Pantoja, Cecilia Lázaro, Rosario Vidal
A dummy variable is a binary, nominal, dichotomous, categorical variable that can only take the values 0 or 1, indicating the absence/presence of the measured attribute. Here, the letter ‘D’ at the beginning of the variable name identifies dummy variables. Each categorical variable must be transformed through dummy variables, as many as possible alternatives minus one unit. The categorical variable consistency is therefore transformed in three dummy variables DSOFT, DFLUID and DLIQUID, represented by the triads: 1,0,0; 0,1,0 and 0,0,1 respectively, and plastic consistency is represented by the triad 0,0,0. The categorical variable environment-cement is transformed in other three dummy variables DCEM1, DCEM2 and DCEM3 without considering differences for the subcategories a, b or c. The categorical variable exposure with 7 levels presented more complexity. A univariate analysis of variance for exposure and the dependent variable GWP was performed. Figure 10a) shows how the exposure variable is grouped in three levels (observed-predicted, predicted-Std. residual and observed-Std. residual). Therefore, the seven initial categories were grouped in three levels and two dummy variables were created based on the results shown in Figure 10b): QHSEX grouping exposures H and s/and QcEX for the exposure Qc. The rest of exposures (Qa, Qb, F, E) are represented by the dummy’s duo 0,0.
True Gaussian mixture regression and genetic algorithm-based optimization with constraints for direct inverse analysis
Published in Science and Technology of Advanced Materials: Methods, 2022
In (1), mole fractions possess values between 0 and 1, where upper and lower limits are necessary for synthesis conditions, such as temperature and pressure, depending upon the constraints of an equipment. Examples of variables that have only 0 or 1 in (2) are dummy variables, such as presence or absence of an additive, washing, and pretreatment of raw materials. All categorical variables can be represented using dummy variables. Variables with values other than 1 can be converted to 0 or 1 by scaling. In (3), when considering mole fractions of raw materials while mixing several raw materials, it is necessary to total them up to 1. Some sets of variables can have different total values. As given in (4), the total value of several variables are required to be in a certain range.
What is the role of carsharing toward a more sustainable transport behavior? Analysis of data from 80 major German cities
Published in International Journal of Sustainable Transportation, 2022
Daniel Göddeke, Konstantin Krauss, Till Gnann
The explaining variables used can be assigned to three areas and are summarized in Table 2: Sociodemographic characteristics, that is, age, gender, income, and education. Individual mobility resources include the possession of a driver’s license as well as a public transit pass, the number of cars in the household, bicycle ownership, and the membership in a carsharing organization. Furthermore, supply and city characteristics cover the city’s supply density with carsharing vehicles, binary variables indicating the existence of public transit stations within walking distance, and a binary variable indicating if the respective city is a metropolis according to RegioStaR7 classification. The RegioStaR7 categorization integrates different settlement structures, includes the central importance of cities and enables a population distribution that is suitable for sampling (BMVI, 2020c). Here, due to the urban focus, we select all cities that are metropolises (RegioStar7-71: e.g., Berlin or Hamburg), or regiopolises or large cities (RegioStaR7-72: e.g., Karlsruhe or Wiesbaden). Categorical input variables are encoded as dummy variables with the middle category being the reference basis.