Data cleaning: Difference between revisions

Jump to navigation Jump to search
156 bytes added ,  30 April 2015
Line 157: Line 157:


=== abnormal values of time data ===
=== abnormal values of time data ===
abnormal values if the data was generated in recent years
abnormal values if the data was generated in recent years & not newer than today
* {{code | code = 0001-01 00:00:00}} occurred in MySQL {{code | code = datetime}} type
* {{code | code = 0001-01 00:00:00}} occurred in MySQL {{code | code = datetime}} type
* {{code | code = 1900/1/0}} (converted time formatted value from 0), {{code | code = 1900/1/1}} (converted time formatted value from 1), {{code | code = 1900/1/2}} ... occurred in MS Excel
* {{code | code = 1900/1/0}} (converted time formatted value from 0), {{code | code = 1900/1/1}} (converted time formatted value from 1), {{code | code = 1900/1/2}} ... occurred in MS Excel
* future time


find normal values: Assume the data was generated in recent 10 years
find normal values: Assume the data was generated in recent 10 years & not newer than today
* MySQL:  
* MySQL:  
** {{code | code = SELECT * FROM `my_table` WHERE YEAR( CURDATE() ) - YEAR( `my_time_column`) <= 10;}}
** {{code | code = SELECT * FROM `my_table` WHERE ( `my_time_column` >=  CURDATE() - INTERVAL 10 YEAR )  AND  ( `my_time_column` > CURDATE() + 1);}}
** {{code | code = SELECT * FROM `my_table` WHERE `my_time_column` >CURDATE() - INTERVAL 10 YEAR;}}
** {{code | code = SELECT * FROM `my_table` WHERE ( YEAR( CURDATE() ) - YEAR( `my_time_column`) <= 10 )  AND  ( `my_time_column` > CURDATE() + 1); }}


== duplicate data ==
== duplicate data ==

Navigation menu