How to Handle Noisy Data in preprocessing of data?

How to Handle Noisy Data in preprocessing of data?

- April 07, 2019

Binning method:(one of the method)

first sort data and partition into (equi-depth) bins
then one can smooth by bin means, smooth by bin median, smooth by bin boundaries, etc.

Equal-width (distance) partitioning:

It divides the range into N intervals of equal size: uniform grid
if A and B are the lowest and highest values of the attribute, the width of intervals will be: W = (B-A)/N.
The most straightforward
But outliers may dominate the presentation
Skewed data is not handled well.

Equal-depth (frequency) partitioning:

It divides the range into N intervals, each containing approximately same number of samples
Good data scaling
Managing categorical attributes can be tricky.

Code for binning (if needed we can edit for user input instead of random) : Feel free to comment about mistakes and doubts.

Comments