How to Handle Noisy Data in preprocessing of data?
Binning method:(one of the method)
- first sort data and partition into (equi-depth) bins
- then one can smooth by bin means, smooth by bin median, smooth by bin boundaries, etc.
Equal-width (distance) partitioning:
- It divides the range into N intervals of equal size: uniform grid
- if A and B are the lowest and highest values of the attribute, the width of intervals will be: W = (B-A)/N.
- The most straightforward
- But outliers may dominate the presentation
- Skewed data is not handled well.
Equal-depth (frequency) partitioning:
- It divides the range into N intervals, each containing approximately same number of samples
- Good data scaling
- Managing categorical attributes can be tricky.
Comments
Post a Comment