Data cleaning: Difference between revisions

Jump to navigation Jump to search
146 bytes added ,  30 April 2015
Line 157: Line 157:


=== abnormal values of time data ===
=== abnormal values of time data ===
abnormal values if the data was generated in recent years & not newer than today
Definition of abnormal values if the time data ([http://en.wikipedia.org/wiki/Time_series time series]) was generated
* 10 years before &  
* newer than today
 
List of the possible and abnormal values:
* {{code | code = 0001-01 00:00:00}} occurred in MySQL {{code | code = datetime}} type
* {{code | code = 0001-01 00:00:00}} occurred in MySQL {{code | code = datetime}} type
* {{code | code = 1900/1/0}} (converted time formatted value from 0), {{code | code = 1900/1/1}} (converted time formatted value from 1), {{code | code = 1900/1/2}} ... occurred in MS Excel
* {{code | code = 1900/1/0}} (converted time formatted value from 0), {{code | code = 1900/1/1}} (converted time formatted value from 1), {{code | code = 1900/1/2}} ... occurred in MS Excel
* future time
* future data: the date after today


find normal values:  
Find the normal values:  
* MySQL: Assume the data was generated in recent 10 years & not newer than today
* MySQL: Assume the data was generated in recent 10 years & not newer than today
** {{code | code = SELECT * FROM `my_table` WHERE ( `my_time_column` >=  CURDATE() - INTERVAL 10 YEAR )  AND  ( `my_time_column` < CURDATE() + 1);}}  
** {{code | code = SELECT * FROM `my_table` WHERE ( `my_time_column` >=  CURDATE() - INTERVAL 10 YEAR )  AND  ( `my_time_column` < CURDATE() + 1);}}  

Navigation menu