Data cleaning: Difference between revisions

Jump to navigation Jump to search
2 bytes removed ,  12 March 2015
m
no edit summary
mNo edit summary
Line 75: Line 75:
# find records with empty value: (not contains {{kbd | key = NULL}} value)
# find records with empty value: (not contains {{kbd | key = NULL}} value)
#* MySQL: {{kbd | key = SELECT * FROM table_name WHERE LENGTH(TRIM( column_name )) = 0;}} {{Exclaim}} SQL query {{kbd | key =SELECT * FROM table_name WHERE column_name IS NOT NULL}} includes empty value  
#* MySQL: {{kbd | key = SELECT * FROM table_name WHERE LENGTH(TRIM( column_name )) = 0;}} {{Exclaim}} SQL query {{kbd | key =SELECT * FROM table_name WHERE column_name IS NOT NULL}} includes empty value  
# find NOT empty records means records without NULL or empty value:
#* MySQL: {{kbd | key =<nowiki>SELECT * FROM table_name WHERE LENGTH(TRIM( column_name )) != 0;</nowiki>}}
#* MySQL: {{kbd | key =<nowiki>SELECT * FROM table_name WHERE column_name != '' AND column_name IS NOT NULL;</nowiki>}}
#  Excel starting date: 1900/1/0 (converted time formatted value from 0), 1900/1/1 (converted time formatted value from 1), 1900/1/2 ...  
#  Excel starting date: 1900/1/0 (converted time formatted value from 0), 1900/1/1 (converted time formatted value from 1), 1900/1/2 ...  
#* solution: step1: Replace the year > 100 from this year with empty value at EXCEL: {{kbd | key =<nowiki>=IF(ISERR(YEAR(A2)), "", IF(YEAR(A2)<1914, "", A2))</nowiki>}} (this formula also handle empty value and non well-formatted column value ex: 0000-12-31 ) ; step2: change the format of cell to time format
#* solution: step1: Replace the year > 100 from this year with empty value at EXCEL: {{kbd | key =<nowiki>=IF(ISERR(YEAR(A2)), "", IF(YEAR(A2)<1914, "", A2))</nowiki>}} (this formula also handle empty value and non well-formatted column value ex: 0000-12-31 ) ; step2: change the format of cell to time format
Line 95: Line 92:
check numeric range
check numeric range
* MySQL: {{kbd | key = SELECT * FROM table_name WHERE column_name BETWEEN ''min_number'' AND ''max_number'';}} the value >= ''min_number'' AND value <= ''max_number''  ( ''min_number'' ≤ value ≤ ''max_number'' )
* MySQL: {{kbd | key = SELECT * FROM table_name WHERE column_name BETWEEN ''min_number'' AND ''max_number'';}} the value >= ''min_number'' AND value <= ''max_number''  ( ''min_number'' ≤ value ≤ ''max_number'' )
find NOT empty records means records without NULL or empty value:
* MySQL: {{kbd | key =<nowiki>SELECT * FROM table_name WHERE LENGTH(TRIM( column_name )) != 0;</nowiki>}}
* MySQL: {{kbd | key =<nowiki>SELECT * FROM table_name WHERE column_name != '' AND column_name IS NOT NULL;</nowiki>}}


== verify the format of field value ==
== verify the format of field value ==

Navigation menu