Feature selection – Knowledge and References

Explore chapters and articles related to this topic

Design of an Intelligent System for Diabetes Prediction by Integrating Rough Set Theory and Genetic Algorithm

Published in Teena Bagga, Kamal Upreti, Nishant Kumar, Amirul Hasan Ansari, Danish Nadeem, Designing Intelligent Healthcare Systems, Products, and Services Using Disruptive Technologies and Health Informatics, 2023

Shampa Sengupta, Kumud Ranjan Pal, Vivek Garg

There are two broader-level dimensionality reduction techniques [18–19] known as feature selection [11] and feature extraction [20]. Feature selection [11] is the method of selecting the important subset of features from the feature pool without compromising the system accuracy as represented by the original features. The objects with its conditional features and the decision feature most often represent a decision system. A conditional feature is important or significant if it contributes much to take the prediction on a decision feature accurately. This process is playing an important role to enhance the performance of the model by reducing the computational cost in terms of space and time. Feature selection is of three types (based on the evaluation process) such as a filter approach [11], wrapper approach [11] and embedded method [11]. If the feature selection job is performed independent of any learning algorithm, then it is the filter approach [11], whereas in case of the wrapper approach the feature selection method is integrated with the learning algorithm where accuracy of the induction algorithm is considered as the measure of suitability of the subset.

Standardisation of wind turbine SCADA data suited for machine learning condition monitoring

View Chapter

Purchase Book

Published in C. Guedes Soares, T.A. Santos, Trends in Maritime Technology and Engineering Volume 2, 2022

I.M. Black, A. Kolios

Feature selection is the process of selecting a subset of properties listed in the data. The data-set may be of significance or irrelevance to the evaluation of the targeted outputs assuming that the data included has an impact on the accuracy of the model. In general, there are two methods filters and wrappers. Both evaluate the preset criteria independently before the machine learning method. Some of the most common methods comprise statistical methods such as Pearson’s correlation or decision trees such as xgboost or LightGMB. To deal with the noisy data from an operational wind turbine A RandomeForest decision tree is implemented to perform the feature selection. The information gained from the RandomeForest is calculated by: G(T,X)=Entropy(T)−Entropy(T,X)

Healthcare Applications Using Biomedical AI System

View Chapter

Purchase Book

Published in Saravanan Krishnan, Ramesh Kesavan, B. Surendiran, G. S. Mahalakshmi, Handbook of Artificial Intelligence in Biomedical Engineering, 2021

S. Shyni Carmel Mary, S. Sasikala

Many research works focused on the development of data mining algorithms to learn the regularities in these rich, mixed medical data. The success of data mining on medical data sets is affected by many factors. Knowledge discovery during training is more difficult if information is irrelevant or redundant or if the data is noisy and unreliable. Feature selection is an important process for detecting and eradicating as much of the inappropriate and redundant information as possible. The necessary preprocessing step for analyzing these data, i.e., feature selection, is often considered, as this method can reduce the dimensionality of the data sets and often leads to better analysis. Research proves that the reasons for feature selection include improvement in performance prediction, reduction in computational requirements, reduction in data storage requirements, reduction in the cost of future measurements, and improvement in data or model understanding.

Feature subset selection in structural health monitoring data using an advanced binary slime mould algorithm

View Article

Journal Information

Published in Journal of Structural Integrity and Maintenance, 2023

Ramin Ghiasi, Abdollah Malekjafarian

The automatic feature selection approach introduced in this research is then used to select the best subset of features. As mentioned in the objective function of ABSMA, the selected features should meet the following two conditions: the maximum distance between classes and the minimum distance within classes. Considering the high impact of the transfer function on the performance of the ABSMA, this function must be selected first, which will be discussed in the next subsection. For providing the stochastic behaviour of MOAs, the performance of the algorithms is compared using the best, worst, average and standard deviation (SD) of the obtained fitness values over 20 independent runs in Table 4. Columns ABSMA-V1, ABSMA-V2, ABSMA-V3, ABSMA-V4, ABSMA-S1, ABSMA-S2, ABSMA-S3, and ABSMA-S4 give the results of the transfer functions V1, V2, V3, V4, S1, S2, S3, and S4, respectively. As stated above, MOAs have stochastic nature and in each independent run, they may have slightly different results. Therefore, for comparing their performance, the approach used by other researchers (Varaee & Ghasemi, 2017) (considering the best, worst, average, and standard deviation of fitness values) is employed here. The results of this analysis are given in Table 3, where ABSMA-V2 shows the best performance in most indexes (best, average, and worst) in comparison with other transfer functions. Therefore, V2 is selected as the transfer function in this study. For simplicity, ABSMA-V2 will be denoted as ABSMA in the rest of the article.

Coal free-swelling index modeling by an ordinal-based soft computing approach

View Article

Journal Information

Published in International Journal of Coal Preparation and Utilization, 2023

M. Pirizadeh, M. Manthouri, S. Chehreh Chelgani

One of the most critical data mining operations, especially in machine learning issues with high-dimensional data, is feature selection. The purpose of feature selection is to choose a subset of features with the maximum useful information and minimum possible size in a way that ultimately leads to an increase in the ability to predict models. In general, a feature selection operation is considered valid when it can efficiently eliminate both irrelevant features containing meager information load and redundant features comprising duplicate and similar information. Consequently, identifying such a subset will improve the model training process in various directions, including reducing computational complexity, increasing training speed, simplicity of the final model, higher generalization capacity, and eventually better model performance (Pirizadeh et al. 2021). One of the most common methods of selecting features is correlation-based approaches such as Pearson, which has also been used in previous studies in FSI modeling. Such methods, on the one hand, do not have the ability to detect nonlinear relationships between features. On the other hand, because the FSI prediction problem is a classification type, it needs to encode the output labels, which can lead to misleading results in correlation-based methods.

Early detection of Parkinson disease using stacking ensemble method

View Article

Journal Information

Published in Computer Methods in Biomechanics and Biomedical Engineering, 2023

Saroj Kumar Biswas, Arpita Nath Boruah, Rajib Saha, Ravi Shankar Raj, Manomita Chakraborty, Monali Bordoloi

Feature selection is the process of selecting a subset of relevant features for the use in model construction. Using feature importance with tree based classifiers and feature correlation matrix with heatmap, the number of relevant features for PD classification are calculated. Then, the Sequential Floating Forward Selection (SFFS) method is employed to identify the relevant features. The SFFS is given below.Step1: Let k = 0Step2: If k = desired size, terminate; otherwise add the most significant feature to the current sub-set of size k. Let k = k + 1Step3: Conditionally, remove the least significant feature from the current subsetStep4: If the current subset is the best subset of size (k − 1) found so far, let k = (k − 1) and go to Step3. Else return the conditionally removed feature and go to Step2.