Normalization – Knowledge and References

Explore chapters and articles related to this topic

A Machine Learning Approach for Translating Weather Information into Actionable Advisory for Farmers

Published in Om Prakash Jena, Sudhansu Shekhar Patra, Mrutyunjaya Panda, Zdzislaw Polkowski, S. Balamurugan, Industrial Transformation, 2022

Santosh Kumar Behera, Darshan Vishwasrao Medhane

Numerical features require some mathematical processing for us to be able to use them. Inputs may have a different unit, which means the variables have different scales, which may increase the difficulty of the problem being modeled. So, it is better to normalize them to a scale of 0–1. Normalization is a rescaling of the data from the original range so that all values are within the range of 0–1. Numerical data can be normalized using the following formula: y=x-min(col)max(col)-min(col)Image of equation for Normalization of data

HVAC systems performance prediction

View Chapter

Purchase Book

Published in Jan L.M. Hensen, Roberto Lamberts, Building Performance Simulation for Design and Operation, 2019

Jonathan Wright

The scatter in the data about the curves is a result of the digitizing processes used to extract numerical values for flow rate and pressure rise from the fan performance curves. The scatter contributes to the uncertainty in the performance prediction. Uncertainty (scatter) can also occur due to a lack of geometric similarity among the different component diameters (Wright 1991) or the effects of the working fluid being compressible (although this effect is limited for the range of fluid pressure rise occurring in most HVAC systems). Uncertainty in the model prediction can also occur due to the poor or unreliable performance measurement by the component manufacturer. Normalization of the data has the benefit that it makes such errors and uncertainty apparent and tends to average their effect.

The Knowledge Discovery Process

View Chapter

Purchase Book

Published in Richard J. Roiger, Data Mining, 2017

Richard J. Roiger

A common data transformation involves changing numeric values so they fall within a specified range. Classifiers such as neural networks do better with numerical data scaled to a range between 0 and 1. Normalization is particularly appealing with distance-based classifiers, because by normalizing attribute values, attributes with a wide range of values are less likely to outweigh attributes with smaller initial ranges. Four common normalization methods include the following: Decimal scaling. Decimal scaling divides each numerical value by the same power of 10. For example, if we know the values for an attribute range between −1000 and 1000, we can change the range to −1 and 1 by dividing each value by 1000.Min–max normalization. Min–max is an appropriate technique when minimum and maximum values for an attribute are known. The formula is

Novel thermal conductivity-mixing ratio-temperature mathematic model for forecasting the thermal conductivity of biodiesel-diesel-ethanol blended fuel

View Article

Journal Information

Published in Energy Sources, Part A: Recovery, Utilization, and Environmental Effects, 2022

Zhongjin Zhao, Li Fashe, Wang Shuang, Huicong Zhang, Yuzeng Zheng, Xuyao He, Wenchao Wang

Zöldy and Lin (Lin, Gaustad, and Trabold 2013; Zöldy 2001) investigated the optimal mixing ratio of diesel, biodiesel, and ethanol; the optimal mixing ratio was determined to be 60–90% for diesel, 5–30% for biodiesel, and 5–10% for ethanol. Therefore, 12 mixing ratios were obtained for the ternary blended fuel with 5% intervals, as shown in Table 2 and Figure 2. The value of mixing ratios was determined by the normalization method. One form of normalization is to change the number to a decimal number between 0 and 1, which facilitates data processing. The mixing ration of D90B05E05 was set to 0.900505. All percentages are defined in volume basis. The obtained soybean biodiesel/Jatropha biodiesel/catering waste oil biodiesel-diesel-ethanol blended fuel was named as SBED, JBED, and CBED, respectively.

Machine learning approach to the safety assessment of a prestressed concrete railway bridge

View Article

Journal Information

Published in Structure and Infrastructure Engineering, 2022

Giulia Marasco, Federico Oldani, Bernardino Chiaia, Giulio Ventura, Fabrizio Dominici, Claudio Rossi, Franco Iacobini, Andrea Vecchi

It is worth pointing out the method used in the pre-processing phase, consisting of a z-score normalization with respect to a moving time window. Among the most common data transformation normalization approaches are the min-max, the z-score, and the decimal scaling. Of them, the z-score method overcomes the limitation of knowing the minimum and maximum value of the series but can only be applied to stationary series. As in other fields, e.g. finance and economics, most time series associated with structural quantities are non-stationary (Tsay, 2005). This behavior is accentuated for some materials such as concrete that develop phenomena deferred in time (e.g. fluage). To effectively handle non-stationary problems, a sliding window technique has been adopted. In this specific case, the values (v), for input and output time-series, have been normalized with the following formula (Equation (1)), where µ and σ are the mean and standard deviation computed on the data belonging to the previous temporal window, respectively. After the introduction of the normalized value v’ in the machine learning algorithm, the data have been rescaled in their original range (Equation (2)) by means of a post-processing operation consisting of an inverse z-score normalization:

Long short-term memory network-based emission models for conventional and new energy buses

View Article

Journal Information

Published in International Journal of Sustainable Transportation, 2021

Zhuoqun Sun, Chao Wang, Zhirui Ye, Hui Bi

Data normalization is used to adjust data measurements on different scales to a notionally common scale, which can remove the magnitudes and improve the comparability of data. The main purpose of data normalization is to accelerate the convergence of the LSTM network. Meanwhile, the estimation accuracy may be enhanced after normalization. The normalization method used in this work was the z-score normalization that was based on mean and standard deviation of the raw data. The normalized value represents the distance between the original data and its mean where the standard deviation is the basic units. The normalized value will fluctuate around 0 where the original data lower than average will be negative after z-score normalization, and positive if it is not. The normalized ei’ value of ith data ei in raw data set E can be calculated as: where , s is the mean and standard deviation of original data.