Feature scaling – Knowledge and References

Explore chapters and articles related to this topic

Settlement prediction of an urban shield tunnel using artificial neural networks

Published in Daniele Peila, Giulia Viggiani, Tarcisio Celestino, Tunnels and Underground Cities: Engineering and Innovation meet Archaeology, Architecture and Art, 2020

L. Li, L.K. Dong, F.F. Wang

Feature scaling is used to standardize the range of all data features, and it’s generally performed during data preprocessing. In this case, raw data is comprised of attributes with varying scales. For example, air cabin pressure P ranges from 2 Bar to 6 Bar while total thrust of the shield F ranges from 20000kN to 140000kN. As total thrust of the shield has a wide range of values, the settlement may be governed by this feature. Therefore the range of all features should be normalized in order to make each feature contributing relatively proportionately to the final settlement. Another benefit of this is that gradient descent converges much faster with feature scaling than without.

Settlement prediction of an urban shield tunnel using artificial neural networks

View Chapter

Purchase Book

Published in Daniele Peila, Giulia Viggiani, Tarcisio Celestino, Tunnels and Underground Cities: Engineering and Innovation meet Archaeology, Architecture and Art, 2019

L. Li, L.K. Dong, F.F. Wang

Predictive Analysis of Type 2 Diabetes Using Hybrid ML Model and IoT

View Chapter

Purchase Book

Published in Sudhir Kumar Sharma, Bharat Bhushan, Narayan C. Debnath, IoT Security Paradigms and Applications, 2020

Abhishek Sharma, Nikhil Sharma, Ila Kaushik, Santosh Kumar, Naghma Khatoon

Feature Scaling: Feature scaling is the process of bringing down the data in features to a standard scale so that the model can be easily trained on it. Feature scaling significantly boosts the performance of the model. Our dataset has some attributes with values ranging from 0 to 100 and others with values from 100 to 1000. We need to perform data normalization to bring our data to one scale. Data normalization is a process in which features are rescaled to the range of 0–1. We used simple feature scaling to normalize the dataset.

Symbiotic Organisms Search Optimization based Faster RCNN for Secure Data Storage in Cloud

View Article

Journal Information

Published in IETE Journal of Research, 2023

J. Thresa Jeniffer, A. Chandrasekar, S. Jothi

The IoTID20 dataset features are selected to maintain those highly correlated with attacks. Due to high data imbalance, the dataset is oversampled to obtain an efficient result. The dataset CSE-CIC-IDS2018 contains 80, 79 features are utilized by the elimination of timestamp features. In the dataset, most data classes are constant data. The feature protocol is mapped for 3 instances and they are UDP, hop-by-hop IPv6, and TCP. The feature encoding generated 15 features for multiclass labels and 80 features for the input data. Feature scaling mechanism is converting the values range from entire features to predefined ranges. The scaling feature methods are needed for characteristics that have larger values. A few ways for feature scaling the ML include scaling, normalization, and standardization. The z-score normalization is performed in the initial testing with min–max scaling which ranges from [−1, 1] and [0, 1]. From the above equation, and the amount of testing or training data, , the number of features is denoted by , the variable value is denoted by , the lowest values of the column features are denoted by , and the largest value of the column feature is denoted by . The scaling map characteristics range from [0,−1].

Genetic Folding (GF) Algorithm with Minimal Kernel Operators to Predict Stroke Patients

View Article

Journal Information

Published in Applied Artificial Intelligence, 2022

Mohammad A. Mezher

In the proposed methodology, the stroke prediction problem is solved using a minimal Genetic Folding algorithm. For classification, the dataset is obtained from the Kaggle data repository which contains 5110 observations with 12 attributes. Several preprocessing steps are carried out on the dataset to clean the data. After data cleaning, encoding techniques including OneHot Encoder and Label Encoder are applied. Afterward feature scaling technique StandardScaler is applied to normalize a feature by removing its mean and scaling it to unit variance. The dataset is imbalanced, to balance the dataset data resampling technique SMOTE is applied. After resampling the dataset, the dataset is split into training and testing for model training and evaluation. The flow of the proposed methodology is shown in Figure 3.

Machine learning algorithms for the prediction of the strength of steel rods: an example of data-driven manufacturing in steelmaking

View Article

Journal Information

Published in International Journal of Computer Integrated Manufacturing, 2020

Estela Ruiz, Diego Ferreño, Miguel Cuartas, Ana López, Valentín Arroyo, Federico Gutiérrez-Solana

Standardization/feature scaling of a dataset is a mandatory requirement for some ML estimators and a good recommendation for others. Some regressors or classifiers, such as KNN, calculate the distance in the feature’s hyperspace between instances; therefore, the distance between two instances will be governed by the features with the broadest range of values. For this reason, the range of all features must be normalized so that each one contributes approximately proportionately to the final distance. For other algorithms, such as MLP, scaling is not compulsory but recommended because gradient descent converges much faster with feature scaling. In this study, features were scaled though the StandardScaler provided by Scikit-Learn which standardizes the features by removing the mean and scaling to unit variance. To ensure that no information was leaked from the test split into the train dataset, the fit_transform method of the imputer was applied in the training dataset and then the transform method in the test dataset.