Data cleaning

From LemonWiki共筆
Jump to navigation Jump to search

is null

Finds whether a variable is NULL. online demo

  • PHP is_null
  • Google spreadsheet / Excel:
    • ISERR(value) " value - The value to be verified as an error type other than #N/A." ex: #NULL!
    • If the cell value is exactly NULL not #NULL!, You may use COUNTIF(value, "NULL") or EXACT(value, "NULL")
  • MySQL SQL syntax: SELECT * FROM table WHERE column IS NULL;[1]

Finds whether a variable is NOT NULL

  • MySQL SQL syntax: SELECT * FROM table WHERE column IS NOT NULL;

check if field value was not fulfilled: NULL, empty value

Icon_exclaim.gif NOT include those data which its field value fulfilled with default value automatically

(demo on sqlfiddle)

  1. find records with NULL value: (note: not #NULL!)
    • MySQL solution: SELECT * FROM table_name WHERE column_name IS NULL;
    • EXCEL: =EXACT(A2, "NULL")
  2. find records with empty value: (not contains NULL value)
    • MySQL: SELECT * FROM table_name WHERE LENGTH(TRIM( column_name )) = 0; Icon_exclaim.gif SQL query SELECT * FROM table_name WHERE column_name IS NOT NULL includes empty value
  3. find NOT empty records means records without NULL or empty value:
    • MySQL: SELECT * FROM table_name WHERE LENGTH(TRIM( column_name )) != 0;
    • MySQL: SELECT * FROM table_name WHERE column_name != '' AND column_name IS NOT NULL;
  4. Excel starting date: 1900/1/0 (converted time formatted value from 0), 1900/1/1 (converted time formatted value from 1), 1900/1/2 ...
    • solution: step1: Replace the year > 100 from this year with empty value at EXCEL: =IF(ISERR(YEAR(A2)), "", IF(YEAR(A2)<1914, "", A2)) (this formula also handle empty value and non well-formatted column value ex: 0000-12-31 ) ; step2: change the format of cell to time format
    • trivial approach : EXCEL: =IF(ISERR(YEAR(A2)), "", IF(YEAR(A2)-YEAR(NOW())>100, "", A2)) Icon_exclaim.gif this formula could not handle empty value because it return 0. If I change the format of cell to time format, 0 will become 1900/1/0.
  5. Using PHP empty() function to find 0, null, false, empty string, empty array values.

check if field value was fulfilled

length of string > 0

  • MySQL: SELECT * FROM table_name WHERE LENGTH(TRIM( column_name )) != 0; demo[1]

column value is not null or 0

  • Excel: COUNTIFS(criteria_range1, "<>NULL", criteria_range1, "<>0")[2]

find if number or cell value is positive integer

  • EXCEL: =IFERROR(IF(AND(INT( value )= value, value>0), TRUE, FALSE), FALSE)[3] online demo

verify the format of field value

email:

  • EXCEL: =IF(ISERR(FIND("@", A2, 1)), FALSE, TRUE) only check the field if contains @ symbol or not
    • result: (1) normal condition: return TRUE; (2) exceptional condition: return FALSE if @ symbol was not found
  • EXCEL: =FIND("@", A2, 2) only check the field if contains @ symbol or not
    • syntax: FIND(find_text, with_text, [start_num]) the start_num is 2 because the position of @ symbol should be larger than 1 (position of first char is 1)
    • result: (1) normal condition: return the number larger than 1; (2) exceptional condition: return #VALUE! if @ symbol was not found
  • PHP: PHP FILTER_VALIDATE_EMAIL Filter
    • "Returns the filtered data, or FALSE if the filter fails." quoted from PHP.net

duplicate data

outlier

(left blank intentionally)

data handling

remove first, last or certain characters from text

  • Excel: using RIGHT[4] + LEN[5] functions [6]
  • Excel: if the text length will be removed was fixed, you may try to use REPLACE[7] + LEN functions (demo)

related pages

Data modeling: Data type

references