Categorical variables – Knowledge and References

Explore chapters and articles related to this topic

Introduction

Published in Brandon M. Greenwell, Tree-Based Methods for Statistical Learning in R, 2022

As far as variable types go, this book is mainly concerned with three: Nominal categorical (i.e., categorical where the order of categories doesn't matter). Examples include gender, eye color, zip code, or blood type.Ordered categorical (i.e., categorical where the order of categories matters). Examples include socioeconomic status (e.g., low < middle < high), age range (e.g., [0-10yrs.] < [11-20yrs.] < …), or satisfaction rating (e.g., not satisfied < somewhat satisfied < very satisfied). Ordered categorical variables are sometimes referred to as ordinal.Ordered numeric. Examples include age or temperature measured on a continuum, height, weight, or concentration.

Introduction to Data, Data Patterns, and Data Mining

View Chapter

Purchase Book

Published in Nong Ye, Data Mining, 2013

Nong Ye

Categorical variables have two subtypes: nominal variables and ordinal variables (Tan et al., 2006). The values of an ordinal variable can be sorted in order, whereas the values of nominal variables can be viewed only as same or different. For example, three values of Age (child, adult, and senior) make Age an ordinal variable since we can sort child, adult, and senior in this order of increasing age. However, we cannot state that the age difference between child and adult is bigger or smaller than the age difference between adult and senior since child, adult, and senior are categorical values instead of numeric values. That is, although the values of an ordinal variable can be sorted, those values are categorical and their quantitative differences are not available. Color is a nominal variable since yellow and purple show two different colors but an order of yellow and purple may be meaningless. Numeric variables have two subtypes: interval variables and ratio variables (Tan et al., 2006). Quantitative differences between the values of an interval variable (e.g., Launch Temperature in °F) are meaningful, whereas both quantitative differences and ratios between the values of a ratio variable (e.g., Number of O-rings with Stress) are meaningful.

Analyticity

View Chapter

Purchase Book

Published in Vivek Kale, Digital Transformation of Enterprise Architecture, 2019

Vivek Kale

Data science involves a search for meaningful relationships between variables. We look for relationships between pairs of continuous variables using scatter plots and correlation coefficients. We look for relationships between categorical variables using contingency tables and the methods of categorical data analysis. We use multivariate methods and multi-way contingency tables to examine relationships among many variables. There are two main types of predictive models: regression and classification. Regression is prediction of a response of meaningful magnitude. Classification involves prediction of a class or category.

Can you feel the rhythm? Comparing vibrotactile and auditory stimuli in the rhythm video game Jump‘n'Rhythm

View Article

Journal Information

Published in Behaviour & Information Technology, 2023

Katya Alvarez-Molina, Anke V. Reinschluessel, Tim Kratky, Martin Scharpenberg, Rainer Malaka

Additionally, data were divided not only by levels but also by lives .8 Therefore, an analysis to find the correlation between the number of lives (per subject and level) and Condition, Level, Age, and Sex was calculated. The analysis used a mixed-effects cumulative logit model. These logistic random effects models are a popular tool to analyse multilevel data, also called hierarchical data with a binary or ordinal outcome. Ordinal data are a categorical variable in which levels have a natural ordering (e.g. light, medium, heavy) (Liu and Hedeker 2006). Furthermore, data can have a hierarchical or clustered structure (e.g. school, families) or repeatedly measured across time. In consequence, using mixed-effects regression models, are helpful to correlate all the variables. In particular, a logistic-based model is helpful for the analysis of ordinal data and multileveled (Liu and Hedeker 2006). Therefore, in the analysis, the target variable was the number of lives (per participant and level), and covariates condition (auditory or vibrotactile), level (Level 1 and Level 2), age (twenty to thirty-one years old), and sex (female, male). Participants’ random effect was incorporated because the taps (absolute accuracy) from the same participant may be correlated. The results of the model are shown in Table 2.

Combined high-intensity interval training as an obesity-management strategy for adolescents

View Article

Journal Information

Published in European Journal of Sport Science, 2023

António Videira-Silva, Megan Hetherington-Rauth, Luís B. Sardinha, Helena Fonseca

Data were analysed using the IBM SPSS statistics (IBM SPSS statistics, version 26.0, IBM, New York, USA). Descriptive characteristics including mean ± SD and median (interquartile range) were calculated for normally distributed and skewed outcomes, respectively. Independent sample t-test, and the non-parametric alternative Mann–Whitney U test, were used to analyse baseline differences between participants who completed 6-month exercise intervention and those who dropped out, and baseline differences between the TT and HIIT groups for continuous variables. Chi-square test was used to analyse categorical variables. Within-group changes over time were analysed with paired sample t-test and Wilcoxon. Generalized estimating equations (GEE) were used to estimate between-group effects of the training protocols from baseline to six months on body composition, PA, and clinical outcomes, while controlling for potential confounders (i.e. sex, age, tanner stage, and energy intake). A per-protocol analysis (PPA) was used for all the outcomes, with the exception of biochemical outcomes, where an intention-to-treat analysis (ITTA) was performed due to the existence of missing values at the 6-month assessments (missing values in TT n = 6 vs. HIIT n = 7, p = .672). A p-value of ≤ 0.05 was considered statistically significant.

True Gaussian mixture regression and genetic algorithm-based optimization with constraints for direct inverse analysis

View Article

Journal Information

Published in Science and Technology of Advanced Materials: Methods, 2022

Hiromasa Kaneko

In (1), mole fractions possess values between 0 and 1, where upper and lower limits are necessary for synthesis conditions, such as temperature and pressure, depending upon the constraints of an equipment. Examples of variables that have only 0 or 1 in (2) are dummy variables, such as presence or absence of an additive, washing, and pretreatment of raw materials. All categorical variables can be represented using dummy variables. Variables with values other than 1 can be converted to 0 or 1 by scaling. In (3), when considering mole fractions of raw materials while mixing several raw materials, it is necessary to total them up to 1. Some sets of variables can have different total values. As given in (4), the total value of several variables are required to be in a certain range.