Data cleaning: Difference between revisions

Jump to navigation Jump to search
299 bytes added ,  9 March 2021
m
(5 intermediate revisions by the same user not shown)
Line 295: Line 295:
=== Time data: Data was generated in N years ===
=== Time data: Data was generated in N years ===
Define the abnormal values of the time data ([http://en.wikipedia.org/wiki/Time_series time series])
Define the abnormal values of the time data ([http://en.wikipedia.org/wiki/Time_series time series])
* Verfiy the data were generated in 10 years
* Verfiy the data were generated in N years. Possible abnormal values: {{code | code = 0001-01 00:00:00}} occurred in MySQL {{code | code = datetime}} type.
* Verfiy the data were not newer than today
* Verfiy the data were not newer than today
* Verfiy the year of data were not {{kbd | key=1900}} if the data were imported from Microsoft Excel file. Datevalue<ref>[https://support.microsoft.com/zh-tw/office/datevalue-%E5%87%BD%E6%95%B8-df8b07d4-7761-4a93-bc33-b7471bbff252 DATEVALUE 函數 - Office 支援]</ref> was started from the year {{kbd | key=1900}}.
* Verfiy the year of data were not {{kbd | key=1900}} if the data were imported from Microsoft Excel file. Datevalue<ref>[https://support.microsoft.com/zh-tw/office/datevalue-%E5%87%BD%E6%95%B8-df8b07d4-7761-4a93-bc33-b7471bbff252 DATEVALUE 函數 - Office 支援]</ref> was started from the year {{kbd | key=1900}} e.g.
** {{code | code = 1900/1/0}} (converted time formatted value from 0),
** {{code | code = 1900/1/1}} (converted time formatted value from 1)
* Verfiy the value of data were not {{kbd | key=0000-00-00 00:00:00}}
* Verfiy the value of data were not {{kbd | key=0000-00-00 00:00:00}}
 
* Verfiy the diversity of data values e.g. [https://en.wikipedia.org/wiki/Variance Variance]
List of the possible abnormal values:
* {{code | code = 0001-01 00:00:00}} occurred in MySQL {{code | code = datetime}} type
* {{code | code = 1900/1/0}} (converted time formatted value from 0), {{code | code = 1900/1/1}} (converted time formatted value from 1), {{code | code = 1900/1/2}} ... occurred in MS Excel
* future data: the date after today


Find the normal values:  
Find the normal values:  
Line 529: Line 527:
=== Fix garbled message text ===
=== Fix garbled message text ===
[[Fix garbled message text]]
[[Fix garbled message text]]
== Tools ==
* [https://github.com/IvanMathy/Boop IvanMathy/Boop: A scriptable scratchpad for developers. In slow yet steady progress.] ([https://apps.apple.com/us/app/boop/id1518425043?mt=12 ‎Boop on the Mac App Store]) " ... to paste some plain text and run some basic text operations on it. "


== Further reading ==
== Further reading ==
Line 546: Line 547:
[[Category:Data Science]]
[[Category:Data Science]]
[[Category:MySQL]]
[[Category:MySQL]]
[[Category:Text file processing]]
[[Category:String manipulation]]
Anonymous user

Navigation menu