Anomaly detection: Difference between revisions
Jump to navigation
Jump to search
| Line 23: | Line 23: | ||
== Anomaly detection for string data == | == Anomaly detection for string data == | ||
* | * Created time of the text message | ||
* | * Time frequency of the text message | ||
* | * Length of the text message | ||
* NULL or empty value | * NULL or empty value | ||
* Minor differences of text content | |||
More on: [https://en.wikipedia.org/wiki/Outlier#Identifying_outliers Outlier - Wikipedia] | More on: [https://en.wikipedia.org/wiki/Outlier#Identifying_outliers Outlier - Wikipedia] | ||
[[Category:Data_hygiene]] | [[Category: Data_hygiene]] | ||
[[Category:Data Science]] | [[Category: Data Science]] | ||
Revision as of 16:53, 15 January 2025
Outlier / Anomaly detection
Anomaly detection of numeric data
- Median
- Range Checks
- All values is event or odd
- The values are the same even the column is totally different
Anomaly detection of categorical data (qualitative variable)
- Normal distribution e.g. The interest of audiences should be very different NOT coherent
Anomaly detection for time series data
- Trend
- Dramatically Increase or decrease of rows count for each time period
Anomaly detection for consumer data
For consumer data
- Season issue: consumption data of coat should increase in cold weather
- Holiday issue: consumption data of some gift e.g. moon cake should increase in special holiday e.g. Mid-Autumn Festival
Anomaly detection for string data
- Created time of the text message
- Time frequency of the text message
- Length of the text message
- NULL or empty value
- Minor differences of text content
More on: Outlier - Wikipedia