Data cleaning
Jump to navigation
Jump to search
is null
Finds whether a variable is NULL
- PHP is_null
- Google spreadsheet / Excel:
- ISERR(value) " value - The value to be verified as an error type other than #N/A." ex: #NULL!
- If the cell value is exactly NULL not #NULL!, You may use COUNTIF(value, "NULL") or EXACT(value, "NULL")
- MySQL SQL syntax: SELECT * FROM table WHERE column IS NULL;[1] demo
Finds whether a variable is NOT NULL
- MySQL SQL syntax: SELECT * FROM table WHERE column IS NOT NULL; demo
check if field value was not fulfilled: NULL, empty value
NOT include those data which its field value fulfilled with default value automatically
- NULL value: (note: not #NULL!)
- MySQL solution: SELECT * FROM table_name WHERE column_name IS NULL;
- EXCEL: =EXACT(A2, "NULL")
- empty value:
- solution1: SELECT * FROM table_name WHERE LENGTH(TRIM( column_name )) = 0;
- solution2: SQL query with the condition SELECT * FROM table_name WHERE column_name IS NOT NULL includes NULL & empty value
- Excel starting date: 1900/1/0 (converted time formatted value from 0), 1900/1/1 (converted time formatted value from 1), 1900/1/2 ...
- solution: step1: Replace the year > 100 from this year with empty value at EXCEL: =IF(ISERR(YEAR(A2)), "", IF(YEAR(A2)<1914, "", A2)) (this formula also handle empty value and non well-formatted column value ex: 0000-12-31 ) ; step2: change the format of cell to time format
- trivial approach : EXCEL: =IF(ISERR(YEAR(A2)), "", IF(YEAR(A2)-YEAR(NOW())>100, "", A2)) this formula could not handle empty value because it return 0. If I change the format of cell to time format, 0 will become 1900/1/0.
- Using PHP empty() function to find 0, null, false, empty string, empty array values.
check if field contains value
length of string > 0
- MySQL: SELECT * FROM table_name WHERE LENGTH(TRIM( column_name )) != 0; demo
column value is not null or 0
- Excel: COUNTIFS(criteria_range1, "<>NULL", criteria_range1, "<>0")[2]
verify the format of field value
email:
- EXCEL: =IF(ISERR(FIND("@", A2, 1)), FALSE, TRUE) only check the field if contains @ symbol or not
- result: (1) normal condition: return TRUE; (2) exceptional condition: return FALSE if @ symbol was not found
- EXCEL: =FIND("@", A2, 2) only check the field if contains @ symbol or not
- syntax: FIND(find_text, with_text, [start_num]) the start_num is 2 because the position of @ symbol should be larger than 1 (position of first char is 1)
- result: (1) normal condition: return the number larger than 1; (2) exceptional condition: return #VALUE! if @ symbol was not found
- PHP: PHP FILTER_VALIDATE_EMAIL Filter
- "Returns the filtered data, or FALSE if the filter fails." quoted from PHP.net
duplicate data
outlier
(left blank intentionally)
data handling
remove first, last or certain characters from text
Data modeling: Data type
references
- ↑ MySQL :: MySQL 5.0 Reference Manual :: 3.3.4.6 Working with NULL Values
- ↑ Excel COUNTIFS and COUNTIF with multiple criteria – examples of usage
- ↑ RIGHT、RIGHTB 函數 - Excel - Office.com
- ↑ LEN、LENB 函數 - Excel - Office.com
- ↑ How to remove first, last or certain characters from text in Excel?
- ↑ REPLACE、REPLACEB 函數 - Excel - Office.com