Feature engineering – Knowledge and References

Explore chapters and articles related to this topic

Machine learning solutions for development of performance deterioration models of flexible airfield pavements

Published in Inge Hoff, Helge Mork, Rabbira Garba Saba, Eleventh International Conference on the Bearing Capacity of Roads, Railways and Airfields, Volume 3, 2022

A.Z. Ashtiani, S. Murrell, R. Speir, D.R. Brill

The goal of an ML model is to increase the learning accuracy while maintaining low variance and avoiding overfitting to noise. The process of building ML models for data-driven applications is an iterative process and has three components: feature engineering, algorithm selection and parameter tuning (Kumar et al. 2016). Feature engineering is the process of converting raw data into sets of feature vectors (input variables) that provide the best prediction of model performance. Reducing the number of variables to a subset of useful features is desired for an effective ML prediction. The problem with high-dimensional features is that having more dimensions increases the difficulty of gauging the influence of each feature on the prediction. In addition, models with high numbers of features relative to the number of data samples tend to be prone to overfitting. Determining the best candidate ML algorithm depends on the size and type of the data (e.g. time dependent) and sparsity of the data (infrequent, missing, irregular data). Parameter tuning is the process of determining the value of the hyper-parameters of ML algorithms. For each ML model there are unique hyper-parameter configurations (e.g. loss functions, search strategies) that affect performance. Hyper-parameters are fine-tuned iteratively to create a trade-off between the performance accuracy (bias) and variance of the ML models.

Feature extraction and qualification

View Chapter

Purchase Book

Published in Ruijiang Li, Lei Xing, Sandy Napel, Daniel L. Rubin, Radiomics and Radiogenomics, 2019

Lise Wei, Issam El Naqa

Feature selection and feature extraction both belong to the feature engineering regime, aiming at reducing the dimensionality, extracting most useful information, and finally achieve the best prediction result. Comparing with feature selection, feature extraction has some advantages: (1) the unsupervised characteristic for most feature extraction methods makes them less prone to overfitting; (2) for data that are unlabeled or with few labeled samples; (3) more flexible to represent the data structure and thus have the potential to be more efficient in prediction tasks; and (4) the redundant and feature interaction issues are taken care of automatically. However, it is not as interpretable since the new set is a function of the original variables, and it is usually not reversible.

Feature Engineering for Data Streams

View Chapter

Purchase Book

Published in Guozhu Dong, Huan Liu, Feature Engineering for Machine Learning and Data Analytics, 2018

Yao Ma, Jiliang Tang, Charu Aggarwal

It is well known that the performance of machine learning algorithms strongly depends on the feature representation of the input data [4]. A good set of features provides tremendous flexibilities that allow us to choose fast and simple models. However, the raw representation of data is not usually amenable to learning [13]. Feature engineering is the process to generate new features from the existing raw features by discovering hidden patterns in the data [65]. It aims to enrich the current feature set and increase the predictive power of the learning algorithms consequently. Therefore, feature engineering plays an important role in the success of machine learning in practice: “At the end of the day, some machine learning projects succeed and some fail. What makes the difference? Easily the most important factor is the features used [13].”

Flood susceptibility mapping using meta-heuristic algorithms

View Article

Journal Information

Published in Geomatics, Natural Hazards and Risk, 2022

Alireza Arabameri, Amir Seyed Danesh, M. Santosh, Artemi Cerda, Subodh Chandra Pal, Omid Ghorbanzadeh, Paramita Roy, Indrajit Chowdhuri

A flood susceptibility map is a tool for risk reduction measures (Tehrany et al. 2014; Youssef et al. 2016; Dano et al. 2019; Mind'je et al. 2019; Roy, Chandra Pal, Chakrabortty et al. 2020). In a nonlinear relationship between multiple variables and hazard level, the flood susceptibility assessment is considered as a quantitative evaluation of the degree of rainstorms and their adverse impact, including the root cause of the flood and final output as an environmental hazard. The methods of flood susceptibility maps were categorized into three parts: hydrological and hydrodynamic models, multi-criteria decision analysis, and machine learning models. In machine learning methods, feature engineering is an essential step that uses domain knowledge of the data to features to obtain more accurate results. This critical technique converts the raw data into specific data to the predictive models. Thus, the information from the study area is first converted from raster data to specific spatial resolution. The susceptibility prediction is regarded as a binary classification process to differentiate the grid cell responsible for flood disaster. Each grid cell comprises a feature value that is a conditioning factor.

A machine learning-based analytical framework for employee turnover prediction

View Article

Journal Information

Published in Journal of Management Analytics, 2021

Xinlei Wang, Jianing Zhi

Feature engineering refers to the process of extracting features from raw data and transforming them into forms that are suitable for an ML algorithm (Zheng & Casari, 2018). Basic feature engineering methods include binning, scaling, log/power transforms (Zheng & Casari, 2018). In addition, feature encoding, enhancement, and selection are at the heart of feature engineering (G. Dong et al., 2018). The goal of feature encoding is to convert non-numeric features (e.g., categorical and time) into numeric forms (Zheng & Casari, 2018). Feature enhancement, on the other hand, aims to create new features by strategically mixing the original features (Yang et al., 2019). Also, feature selection aims to determine a subset of features used for training, leading to shorter training time and potentially better performance (Chandrashekar & Sahin, 2014; Das et al., 2021). Existing studies for ET prediction rarely consider more than two feature engineering operations. The most adopted is encoding since most datasets have non-numeric features. Several studies employ feature selection methods (Alaskar et al., 2019; Alduayj & Rajpoot, 2018; Ali, 2021; Cai et al., 2020; Gao et al., 2019; Zhao et al., 2018), and the least adopted is feature enhancement. The only feature enhancement example is by (Cai et al., 2020), in which a graph model is utilized to generate new feature embeddings. The proposed framework considers all three operations, which has not been seen in the literature. The technical details are provided in Section 3.3.

An ensemble machine learning approach for classification tasks using feature generation

View Article

Journal Information

Published in Connection Science, 2023

Wenjuan Feng, Jin Gou, Zongwen Fan, Xiang Chen

Feature selection is an important technique in feature engineering, whose goal is to find the optimal subset of features. Its purpose is to eliminate irrelevant or redundant features, thereby reducing the number of features, improving model accuracy, and reducing runtime. On the other hand, the selection of truly relevant features simplifies the model and assists in understanding the process of data generation. Feature selection, as an important data preprocessing process, has two main advantageous functions: (1) it can reduce the number of features and reduce the dimensionality, which makes the model more generalisable and reduces overfitting; (2) it can enhance the understanding between features and feature values.