Normalization
Major tasks of preprocessing are:
- Data cleaning
- filling missing values
- smoothing of noisy data
- identifying and removing outliers
- resolving inconsistencies
- Data Integration
- integrating data from multiple databases, data file, cubes
- Data transformation
- normalization
- aggregation
- Data reduction
- obtain a reduced representation of data but same results
- Data discretization
- part of data reduction but with particular importance, especially for numeric data
Normalization :
The goal of normalization is to make an entire set of values have a particular property. There are 3 different ways to perform normalization :
The goal of normalization is to make an entire set of values have a particular property. There are 3 different ways to perform normalization :
- min-max normalization
here max, min is the new rangesX_std = (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0)) X_scaled = X_std * (max - min) + min
- z-score normalization
where u is the mean of the training samples, s is the standard deviationz = (x - u) / s
- normalization by decimal scaling
here j is the number of digits in the largest number of the whole attributev_new = v/pow(10,j)
I will demonstrate all the three in a single program as shown below:
output:
[[20, 2], [8, 3], [0, 10], [1, 7], [5, 7]]
MinMaxScaler(copy=True, feature_range=(0, 1))
[20. 10.] [0. 2.]
tranformed : [[1. 0. ]
[0.4 0.125]
[0. 1. ]
[0.05 0.625]
[0.25 0.625]]
[[20. 2.]
[ 8. 3.]
[ 0. 10.]
[ 1. 7.]
[ 5. 7.]]
StandardScaler(copy=True, with_mean=True, with_std=True)
mean : [6.8 5.8]
var_ : [51.76 8.56]
[[ 1.83474958 -1.29881326]
[ 0.16679542 -0.9570203 ]
[-0.94517403 1.43553045]
[-0.80617785 0.41015156]
[-0.25019312 0.41015156]]
[[13.9944423 5.8 ]
[ 9.67777692 6.16571847]
[ 6.8 8.72574777]
[ 7.15972211 7.62859235]
[ 8.59861057 7.62859235]]
Decimal Scaling
decimal scaled : [[0.2 0.02]
[0.08 0.03]
[0. 0.1 ]
[0.01 0.07]
[0.05 0.07]]
Interested one can explore this :
https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#sphx-glr-auto-examples-preprocessing-plot-all-scaling-py
Feel free to comment about mistakes and doubts.
Comments
Post a Comment