Anomaly detection: Difference between revisions

From LemonWiki共筆
Jump to navigation Jump to search
Line 28: Line 28:
* NULL or empty value
* NULL or empty value
* Minor differences of text content
* Minor differences of text content
* Character encoding e.g. [[Fix garbled message text]]


More on: [https://en.wikipedia.org/wiki/Outlier#Identifying_outliers Outlier - Wikipedia]
More on: [https://en.wikipedia.org/wiki/Outlier#Identifying_outliers Outlier - Wikipedia]

Revision as of 16:55, 15 January 2025

Outlier / Anomaly detection

Anomaly detection of numeric data

  • Median
  • Range Checks
  • All values is event or odd
  • The values are the same even the column is totally different

Anomaly detection of categorical data (qualitative variable)

  • Normal distribution e.g. The interest of audiences should be very different NOT coherent

Anomaly detection for time series data

  • Trend
  • Dramatically Increase or decrease of rows count for each time period

Anomaly detection for consumer data

For consumer data

  • Season issue: consumption data of coat should increase in cold weather
  • Holiday issue: consumption data of some gift e.g. moon cake should increase in special holiday e.g. Mid-Autumn Festival

Anomaly detection for string data

  • Created time of the text message
  • Time frequency of the text message
  • Length of the text message
  • NULL or empty value
  • Minor differences of text content
  • Character encoding e.g. Fix garbled message text

More on: Outlier - Wikipedia