Normalization

- April 10, 2019

Major tasks of preprocessing are:

Data cleaning

filling missing values
smoothing of noisy data
identifying and removing outliers
resolving inconsistencies

Data Integration

integrating data from multiple databases, data file, cubes

Data transformation

normalization
aggregation

Data reduction

obtain a reduced representation of data but same results

Data discretization

part of data reduction but with particular importance, especially for numeric data

Normalization :
The goal of normalization is to make an entire set of values have a particular property. There are 3 different ways to perform normalization :

min-max normalization

 X_std = (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0))  
 X_scaled = X_std * (max - min) + min

here max, min is the new ranges

z-score normalization
```
 z = (x - u) / s  
```
where u is the mean of the training samples, s is the standard deviation
normalization by decimal scaling
```
 v_new = v/pow(10,j)  
```
here j is the number of digits in the largest number of the whole attribute

I will demonstrate all the three in a single program as shown below:

output:

 [[20, 2], [8, 3], [0, 10], [1, 7], [5, 7]]  
 MinMaxScaler(copy=True, feature_range=(0, 1))  
 [20. 10.] [0. 2.]  
 tranformed : [[1.  0.  ]  
  [0.4  0.125]  
  [0.  1.  ]  
  [0.05 0.625]  
  [0.25 0.625]]  
 [[20. 2.]  
  [ 8. 3.]  
  [ 0. 10.]  
  [ 1. 7.]  
  [ 5. 7.]]  
 StandardScaler(copy=True, with_mean=True, with_std=True)  
 mean : [6.8 5.8]  
 var_ : [51.76 8.56]  
 [[ 1.83474958 -1.29881326]  
  [ 0.16679542 -0.9570203 ]  
  [-0.94517403 1.43553045]  
  [-0.80617785 0.41015156]  
  [-0.25019312 0.41015156]]  
 [[13.9944423  5.8    ]  
  [ 9.67777692 6.16571847]  
  [ 6.8     8.72574777]  
  [ 7.15972211 7.62859235]  
  [ 8.59861057 7.62859235]]  
 Decimal Scaling  
 decimal scaled : [[0.2 0.02]  
  [0.08 0.03]  
  [0.  0.1 ]  
  [0.01 0.07]  
  [0.05 0.07]]

Interested one can explore this :
https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#sphx-glr-auto-examples-preprocessing-plot-all-scaling-py

Feel free to comment about mistakes and doubts.

Search This Blog

CodeSamples

Normalization

Major tasks of preprocessing are:

Comments

Post a Comment

Popular posts from this blog

Embedding the GitHub Gists codes into blogs

How to Handle Noisy Data in preprocessing of data?

Splitting criteria