Explore chapters and articles related to this topic
Classification Techniques
Published in Harry G. Perros, An Introduction to IoT Analytics, 2021
The information gain metric is used in ID3 and C4.5 algorithms, and it is based on the entropy. Let X be a discrete random variable, and let pi = p(X = i), i = 1, 2, . . , n, be its probability distribution. Then the entropy of X is defined as follows:HX=−∑i=1npilog2pi.Different values for the logarithmic base can be used other than 2, such as e and 10.
Decision Tree and Ensemble Methods
Published in Mark Chang, Artificial Intelligence for Drug Development, Precision Medicine, and Healthcare, 2020
ID3 does not guarantee an optimal solution. It can converge upon local optima. ID3 algorithm is easy to interpret, handles mixed discrete and continuous inputs, and performs automatic variable selection. ID3 is robust to outliers and insensitive to monotone transformations of the inputs because the split points are based on ranking the data points. ID3 scales well to large data sets and can be modified to handle missing features. However, it does not predict very accurately compared to other kinds of models due to the greedy nature of the tree construction algorithm. A single big tree is not stable because a single error in classification can propagate to the leaves. Different methods have been proposed to remedy this, such as bagging, boosting, and random forests. The boosted trees can be used for regression-type and classification-type problems.
Comparative Evaluation of the Discovered Knowledge
Published in Don Potter, Manton Matthews, Moonis Ali, Industrial and Engineering Applications of Artificial Intelligence and Expert Systems, 2020
ID3 is a well known machine learning method, where the knowledge is represented by a non binary decision tree. It derives from the works of Quinlan in the later seventeen’s [DK91]. It has many successfully industrial applications. Let O be the set of objects belonging to various classes. The algorithm chooses the more predictive attribute [HN96] and partitions the objects of O into a set of disjoint subsets C1 C2, …, Cn. Where Q contains the objects of O having the same ith value of the chosen attribute. Then each subset Ci will be partitioned using the same strategy, unless the objects that it contains belong to the same class. In the resultant decision tree, each node represents a tested attribute and each leaf is assigned a class. For a new coming object, we cross the decision tree. In each node, we look for the associated attribute value of the object and we branch to a second node. We repeat this testing task since the root node, to a terminal node. When achieving the terminal node, we assign its associated class to the new object. ID3 is also an incremental method.
Implementing Artificial Intelligence in H-BIM Using the J48 Algorithm to Manage Historic Buildings
Published in International Journal of Architectural Heritage, 2020
David Bienvenido-Huertas, Juan Enrique Nieto-Julián, Juan José Moyano, Juan Manuel Macías-Bernal, José Castro
This technique of classification belongs to the Top Down Induction of Decision Trees (TDIDT) family, and two algorithms developing the decision trees are mainly highlighted: ID3 and C4.5 algorithms. First, the ID3 algorithm was developed by Quinlan (1986), allowing decision trees by means of a training sample to be developed. Then, the C4.5 algorithm is a development of ID3 algorithm published by Quinlan (1993). This new algorithm introduces some changes, such as the pruning of the models by developing less complex systems, and therefore easier to understand. An implementation of the C4.5 algorithm is found in Waikato Environment for Knowledge Analysis (WEKA) software by means of the J48 algorithm. This algorithm improves the functional natures of the C4.5 algorithm, including the method of reduced error pruning. In this study, the J48 algorithm was used.
Evaluation of the effect of learning disabilities and accommodations on the prediction of the stability of academic behaviour of undergraduate engineering students using decision trees
Published in European Journal of Engineering Education, 2020
Gonen Singer, Maya Golan, Neta Rabin, Dvir Kleper
The attribute with the highest ratio of information gain is selected as the splitting attribute (Han, Kamber, and Pei 2012). By using the ratio of the information gain, as opposed to simply the information gain (Equation 5) used by ID3, the C4.5 algorithm overcomes the specific shortcoming of ID3 whereby attributes with large numbers of values are selected in preference to attributes with small numbers of values. Throughout the paper, all log bases are 2.
Predicting students’ learning style using learning analytics: a case study of business management students from India
Published in Behaviour & Information Technology, 2018
C4.5 is an algorithm used to generate a decision tree. C4.5 is a software extension and thus improvement of the basic ID3 algorithm. The decision trees generated by C4.5 can be used for classification, and for this reason, C4.5 is often referred to as a statistical classifier. C4.5 is an evolution and refinement of ID3 that accounts for unavailable values, continuous attribute value ranges, pruning of decision trees, rule derivation and so on.