Normalization
The term “normalization” is an informal expression in statistics, so the phrase “normalized data” can have multiple meanings. In most cases, when you normalize data, you eliminate the units of measurement, which makes it easier to compare data from different locations. Some of the more common methods for standardizing data include:
-
Using z-scores or t-scores to transform data. This is often referred to as standardization. In most cases, when a statistics textbook talks about standardizing data, this is the definition of “standardization” they are likely using.
-
Rescaling data to values between 0 and 1. This is commonly referred to as feature scaling. One possible formula to achieve this is:
x_new = (x - x_min) / (x_max - x_min)
-
Standardized residuals: Ratios used in regression analysis that force residuals into the shape of a normal distribution.
-
Using the formula μ/σ to normalize moments.
-
Normalizing vectors (in linear algebra) to a norm of one. In this sense, normalization refers to transforming a vector so that its length is 1.
In most cases, normalization means transforming all values to numbers between 0 and 1, and all parameters must have the same positive scale. However, outliers in the dataset will be lost.
Standardization
Normalization usually refers to scaling variables between 0 and 1, whereas standardization transforms data to have a mean of zero and a standard deviation of one. This form of standardization is known as the z-score, processed via μ/σ:
x_new = (x - x_mean) / x_std
Calculating the z-score gives the standardized data.
Summary
In most data analysis or statistical analysis scenarios, standardization is preferred, because normalization can cause the loss of outliers.