Explore chapters and articles related to this topic
Detecting Interaction Effects
Published in Max Kuhn, Kjell Johnson, Feature Engineering and Selection, 2019
Let’s review the basic concepts of recursive partitioning to understand why this technique has the ability to uncover interactions among predictors. The objective of recursive partitioning is to find the optimal split point of a predictor that separates the samples into more homogeneous groups with respect to the response. Once a split is made, the same procedure is applied for the subset of samples at each of the subsequent nodes. This process then continues recursively until a minimum number of samples in a node is reached or until the process meets an optimization criterion. When data are split recursively, each subsequent node can be thought of as representing a localized interaction between the previous node and the current node. In addition, the more frequent the occurrence of subsequent nodes of the same pair of predictors, the more likely that the interaction between the predictors is global interaction across the observed range of the predictors.
Feature optimization approach to improve performance for big data
Published in Jimmy C.M. Kao, Wen-Pei Sung, Civil, Architecture and Environmental Engineering, 2017
A decision tree is constructed in a recursive partitioning by splitting the training records into successively purer subsets based on Information Gain (IG) to select the best features. When the nodes are assigned to the same value of the target variable or values are no longer added to the prediction, then the recursion is completed. Let Dt be the set of training features that are associated with note t and y = {y1,y2,…yn} be the class labels. The process of DT is as following:
May the best-sighted win? The relationship between visual function and performance in Para judo
Published in Journal of Sports Sciences, 2021
Kai Krabben, Evgeny Mashkovskiy, H. J. C. (Rianne) Ravensbergen, David L. Mann
The relationship between visual function and performance was analysed through calculation of Pearson’s correlation coefficient. Decision tree analyses were used to determine whether or not the data supported splitting VI judo into more than one sport class, and if so, what the ideal cut-off point(s) between these classes should be. We applied the unbiased recursive partitioning algorithm (Hothorn et al., 2006), which recursively aims to perform univariate splits in the input variables as long as these are significantly associated with the response variable. The results of recursive partitioning are known to be potentially unstable, as small changes in the data sample may lead to substantially different decision trees being built (Strobl et al., 2009). To assess the stability of the decision tree, we examined the variability in cut-off selection by bootstrapping of 10,000 random resamples of our data, using the toolkit for stability assessment of tree-based learners (Philipp et al., 2016). 10,000 samples were randomly drawn with replacement from the original dataset and had the same size as the original dataset. For each of these 10,000 samples, a separate decision tree was built. We summarised the number of splits and the values of the split points over all decision trees to estimate the optimal number of classes and cut-off point(s).
Multi-Regional landslide detection using combined unsupervised and supervised machine learning
Published in Geomatics, Natural Hazards and Risk, 2021
Faraz S. Tehrani, Giorgio Santinelli, Meylin Herrera Herrera
As shown in Figure 1, an RF is a set of DTs (3 in this example). Each DT is a set of internal nodes and leaves. In the internal node, a feature is randomly selected to make decision on how to divide the dataset into two separate sets with similar responses within (Class 0 or Class 1 here). The final features for internal nodes are selected with some criterion, which for classification tasks can be Gini impurity or information gain. One can measure how each feature decreases the impurity of the split. As such, the feature that results in the highest decrease is selected as the internal node. The process of splitting the dataset and the subsets into smaller sets is repeated on each derived subset at internal nodes in a recursive manner called recursive partitioning. The recursion is completed when the subset at a node (leaf node) has all the target class, or when splitting no longer adds value to the predictions (information gain) or when the maximum depth of the tree is reached. Once this procedure ends, the majority of the predicted classes at all DTs will be the final result of the trained RF.
Prediction of the crack condition of highway pavements using machine learning models
Published in Structure and Infrastructure Engineering, 2019
Sylvester Inkoom, John Sobanjo, Adrian Barbu, Xufeng Niu
The recursive partitioning methods implemented in this paper are the classification and regression tree algorithms. Two sets of predictors are chosen for the prediction of the pavement condition. The first model (Model 1) consists of a group of predictors including the Average Daily Traffic and Truck Traffic, Age of Pavement, Asphalt Thickness, the Roadway Functional class and a set of time series data of the previous five-year crack condition state rating of pavement. The second model includes all the above predictors with the exception of the time series condition rating data. The two sets were chosen to assess the best and worst conditions of modelling condition rating. Whereas the earlier set of predictors include the past pavement condition data, the latter has no prior knowledge of the condition of the pavement. The development of the second model was influenced by the fact that the past pavement crack condition rating could be either unknown or not readily available for some roadway segments. In the event of such situation, Departments of Transportation would still be able to predict the condition rating of pavement with a certain confidence level without the previous crack condition rating.