Anomaly detection: Difference between revisions
Jump to navigation
Jump to search
| (2 intermediate revisions by the same user not shown) | |||
| Line 14: | Line 14: | ||
* Trend | * Trend | ||
* Dramatically Increase or decrease of rows count for each time period | * Dramatically Increase or decrease of rows count for each time period | ||
** Example: Regularly scheduled web scraping that collects 9k records per week suddenly drops to 3k records | |||
== Anomaly detection for consumer data == | == Anomaly detection for consumer data == | ||
| Line 27: | Line 28: | ||
* Length of the text message | * Length of the text message | ||
* NULL or empty value | * NULL or empty value | ||
* Minor differences of text content | * Minor differences of text content<ref>[https://medium.com/@ahmetmnirkocaman/how-to-measure-text-similarity-a-comprehensive-guide-6c6f24fc01fe How to Measure Text Similarity: A Comprehensive Guide | by Ahmet Münir Kocaman | Medium]</ref> | ||
* Character encoding e.g. [[Fix garbled message text]] | |||
More on: [https://en.wikipedia.org/wiki/Outlier#Identifying_outliers Outlier - Wikipedia] | More on: [https://en.wikipedia.org/wiki/Outlier#Identifying_outliers Outlier - Wikipedia] | ||
[[Category: | == References == | ||
<references /> | |||
[[Category: Data hygiene]] | |||
[[Category: Data Science]] | [[Category: Data Science]] | ||
Latest revision as of 11:36, 31 October 2025
Outlier / Anomaly detection
Anomaly detection of numeric data[edit]
- Median
- Range Checks
- All values is event or odd
- The values are the same even the column is totally different
Anomaly detection of categorical data (qualitative variable)[edit]
- Normal distribution e.g. The interest of audiences should be very different NOT coherent
Anomaly detection for time series data[edit]
- Trend
- Dramatically Increase or decrease of rows count for each time period
- Example: Regularly scheduled web scraping that collects 9k records per week suddenly drops to 3k records
Anomaly detection for consumer data[edit]
For consumer data
- Season issue: consumption data of coat should increase in cold weather
- Holiday issue: consumption data of some gift e.g. moon cake should increase in special holiday e.g. Mid-Autumn Festival
Anomaly detection for string data[edit]
- Created time of the text message
- Time frequency of the text message
- Length of the text message
- NULL or empty value
- Minor differences of text content[1]
- Character encoding e.g. Fix garbled message text
More on: Outlier - Wikipedia