Data cleaning: Difference between revisions

Jump to navigation Jump to search
484 bytes removed ,  27 August 2020
Line 452: Line 452:


== Outlier / Anomaly detection ==
== Outlier / Anomaly detection ==
Anomaly detection of numeric data
[[Anomaly detection]]
* Median
* Range Checks
* All values is event
* The values are the same even the column is totally different
 
Anomaly detection of categorical data (qualitative variable)
* Normal distribution e.g. The interest of audiences should be very different NOT coherent
 
Anomaly detection for time series data
* Trend
* Dramatically Increase or decrease of rows count for each time period
 
More on: [https://en.wikipedia.org/wiki/Outlier#Identifying_outliers Outlier - Wikipedia]


== unique number of data values ==
== unique number of data values ==
Anonymous user

Navigation menu