Anomaly detection: Difference between revisions

From LemonWiki共筆
Jump to navigation Jump to search
No edit summary
 
(6 intermediate revisions by the same user not shown)
Line 14: Line 14:
* Trend
* Trend
* Dramatically Increase or decrease of rows count for each time period
* Dramatically Increase or decrease of rows count for each time period
** Example: Regularly scheduled web scraping that collects 9k records per week suddenly drops to 3k records


== Anomaly detection for consumer data ==
== Anomaly detection for consumer data ==
* For consumer data
For consumer data
** Season issue: consumption data of coat (大衣) and cold weather (winter 冬天)
 
** Holiday issue: consumption data of special holiday e.g. Mid-Autumn Festival / Moon Festival
* Season issue: consumption data of coat should increase in cold weather
* Holiday issue: consumption data of some gift e.g. moon cake should increase in special holiday e.g. Mid-Autumn Festival


== Anomaly detection for string data ==
== Anomaly detection for string data ==


* created time of the text message
* Created time of the text message
* time frequency of the text message
* Time frequency of the text message
* length of the text message
* Length of the text message
* NULL or empty value
* Minor differences of text content<ref>[https://medium.com/@ahmetmnirkocaman/how-to-measure-text-similarity-a-comprehensive-guide-6c6f24fc01fe How to Measure Text Similarity: A Comprehensive Guide | by Ahmet Münir Kocaman | Medium]</ref>
* Character encoding e.g. [[Fix garbled message text]]


More on: [https://en.wikipedia.org/wiki/Outlier#Identifying_outliers Outlier - Wikipedia]
More on: [https://en.wikipedia.org/wiki/Outlier#Identifying_outliers Outlier - Wikipedia]


[[Category:Data_hygiene]]
== References ==
[[Category:Data Science]]
<references />
 
[[Category: Data hygiene]]
[[Category: Data Science]]

Latest revision as of 11:36, 31 October 2025

Outlier / Anomaly detection

Anomaly detection of numeric data[edit]

  • Median
  • Range Checks
  • All values is event or odd
  • The values are the same even the column is totally different

Anomaly detection of categorical data (qualitative variable)[edit]

  • Normal distribution e.g. The interest of audiences should be very different NOT coherent

Anomaly detection for time series data[edit]

  • Trend
  • Dramatically Increase or decrease of rows count for each time period
    • Example: Regularly scheduled web scraping that collects 9k records per week suddenly drops to 3k records

Anomaly detection for consumer data[edit]

For consumer data

  • Season issue: consumption data of coat should increase in cold weather
  • Holiday issue: consumption data of some gift e.g. moon cake should increase in special holiday e.g. Mid-Autumn Festival

Anomaly detection for string data[edit]

  • Created time of the text message
  • Time frequency of the text message
  • Length of the text message
  • NULL or empty value
  • Minor differences of text content[1]
  • Character encoding e.g. Fix garbled message text

More on: Outlier - Wikipedia

References[edit]