Regression – Knowledge and References

Explore chapters and articles related to this topic

Statistics

Published in Paul L. Goethals, Natalie M. Scala, Daniel T. Bennett, Mathematics in Cyber Research, 2022

An important application of statistics is to be able to estimate the relationship between two or more quantitative variables. This can fundamentally be achieved by correlation and regression analysis. The strength of the relationship between two quantitative variables can be estimated through a correlation analysis. A high correlation indicates that the two variables have a strong relationship, whereas a low correlation is associated with a weak relationship between the two variables. A correlation is often assumed to follow a linear line. Thus, correlation analysis is often related to linear regression analysis (Franzese & Iuliano, 2019). Regression analysis is a statistical approach that involves predicting the value of a dependent variable (response) based on the known value of one or more independent variables. Both correlation and regression analysis are part of the fundamental methods behind modern machine learning algorithms.

Deep Learning and Multimodal Artificial Neural Network Architectures for Disease Diagnosis and Clinical Applications

View Chapter

Purchase Book

Published in Om Prakash Jena, Bharat Bhushan, Nitin Rakesh, Parma Nand Astya, Yousef Farhaoui, Machine Learning and Deep Learning in Efficacy Improvement of Healthcare Systems, 2022

Jeena Thomas, Ebin Deni Raj

Machine learning (ML) is an approach of AI that models the computer system to carry out unique projects without specific instructions and instead depends on patterns and reasoning. Learning can be of three types: supervised, unsupervised, and reinforcement learning. In supervised learning, the machine learns with guidance and uses labeled data to notify the input and output. A target output is present and it is tested with the obtained output. Regression and classification problems come under supervised learning. For predicting a continuous quantity, regression is used, whereas classification problems are mainly used to predict a label for a class. Discovering the patterns is the primary element related with the unsupervised approach. Here, the input is provided, which is then explored for the hidden patterns to find the output. The data is not labeled and there is no target output. Unsupervised learning mainly deals with association and clustering problems. Consequently it discovers the patterns in data and co-occurrence. In clustering, the data will be grouped based on similarity. Reinforcement learning follows the hit and trial concept. There is an agent and it is put in an unknown environment. This agent has to explore the environment by taking actions and making transitions from one state to another to get maximum rewards.

Decoding Common Machine Learning Methods

View Chapter

Purchase Book

Published in Himansu Das, Jitendra Kumar Rout, Suresh Chandra Moharana, Nilanjan Dey, Applied Intelligent Decision Making in Machine Learning, 2020

Srinivasagan N. Subhashree, S. Sunoj, Oveis Hassanijalilian, C. Igathinathane

The two major categories of ML classification are unsupervised and supervised algorithms. Unsupervised ML performs classification on the unlabeled data (only input variables and no output variables with identified labels passed to the algorithm). The unsupervised algorithm learns the underlying structure and distribution in the data and clusters them as groups (e.g., clustering and association), while the supervised ML performs classification and regression tasks on the labeled data (input variables and labeled output variables passed to the algorithm), which helps the models to understand the relation between the input and output variables (Das et al., 2015, 2018). Classification models are employed if the outcome variable is categorical, and regression models are used if continuous. Practical machine learning applications predominantly use supervised classification. The agricultural application datasets used in this chapter (soybean, aphids, and weed species) are labeled and have a categorical output variable. Therefore, common supervised classification ML algorithms, such as LDA and kNN, were considered, decoded, methods developed and based on their algorithms, and discussed.

In pursuit of humanised order picking planning: methodological review, literature classification and input from practice

View Article

Journal Information

Published in International Journal of Production Research, 2023

Thomas De Lombaert, Kris Braekers, René De Koster, Katrien Ramaekers

Most research methodologies pertain to experimental studies. In this research methodology, subjects are typically assigned to treatments, either in an industrial or a lab environment, so that the researcher can make inferences about the relationship between independent and dependent variables (McClave and Sincich 2007). Experimental studies are highly suitable for investigating the impact of human behaviour on operational outcomes and are often supplemented with surveys regarding subjects’ retrospective perceived experiences (Bendoly et al. 2010). There are various statistical analysis procedures (SAPs) to convert collected data into usable information. Table 5 presents an overview of these commonly used procedures and the respective papers in which they are applied. Correlation analysis is defined as an in-depth analysis and discussion of correlations between two or more variables in the data set. Frequently used metrics are the Pearson, Spearman’s rank, and point-biserial correlation coefficient. Regression analysis is the statistical procedure of estimating the relationship between independent and dependent variables. Hypothesis testing is the most frequently used SAP and is primarily used for establishing (non-)significant disparities in outcome variables among treatments. Lastly, other SAP refers to procedures such as principal component analysis, effect-size evaluation, and descriptive statistics comparisons.

A comparative study between PCR, PLSR, and LW-PLS on the predictive performance at different data splitting ratios

View Article

Journal Information

Published in Chemical Engineering Communications, 2022

Teck Fu Thien, Wan Sieng Yeo

Parametric and non-parametric algorithms differ as the latter does not require the estimation of distribution parameters such as mean and standard deviation to obtain an algorithm (Scheff 2016; King and Eckersley 2019). Non-parametric models are generally less powerful due to the lack of supporting evidence when making conclusions on the target function (Scheff 2016). Building a model is not only dependent on the assumptions placed upon it, but there are also different types of learning methods. Types of machine learning include supervised, unsupervised, semi-supervised and reinforcement learning. For supervised learning, the machine learns the target function, to determine a correlation between known input and output variables. Supervised learning algorithms can be subdivided into regression and classification tasks depending on the objectives (Haimi et al. 2013; Vieira et al. 2020). The difference between regression and classification is the former predicts continuous values while the latter is used for categorizing input data. Examples of supervised learning techniques include multiple linear regression, PCR, PLSR, and LW-PLS (Ambika 2019; Chiplunkar and Huang 2019; Thomas 2019; Ibrahim et al. 2020).

Seismic damage assessment of highway bridges by means of soft computing techniques

View Article

Journal Information

Published in Structure and Infrastructure Engineering, 2022

Panagiotis K. Tsikas, Athanasios P. Chassiakos, Vasileios C. Papadimitropoulos

Supervised learning is generally applied in problems associated with classification and regression. Classification is related to prediction systems that utilize discrete values (e.g., Hazus methodology) while regression refers to continuous value systems. The proposed seismic risk assessment methodology is classified as a regression problem type. In fact, whereas Hazus incorporates a number of discrete potential seismic damage levels, i.e., none (1), slight/minor (2), moderate (3), extensive (4) and complete (5), the present methodology employs a continuous assessment scale from 1 to 5, which can more accurately differentiate among bridges within a specific damage level. Thus, the present study tackles the problem as a regression one and implements ANNs and GAs for building up risk assessment models. In this supervised learning approach, input and output data are developed by means of Hazus methodology for a large and representative set of bridges as part of the study. Following training, the ANN models can be used to recall the stored knowledge so as to assess the risk in practically every case.