Data cleaning: Difference between revisions
Jump to navigation
Jump to search
m
Text replacement - "Category:Text file processing" to "Category:String manipulation"
m (Text replacement - "Category:Text file processing" to "Category:String manipulation") |
|||
(5 intermediate revisions by the same user not shown) | |||
Line 295: | Line 295: | ||
=== Time data: Data was generated in N years === | === Time data: Data was generated in N years === | ||
Define the abnormal values of the time data ([http://en.wikipedia.org/wiki/Time_series time series]) | Define the abnormal values of the time data ([http://en.wikipedia.org/wiki/Time_series time series]) | ||
* Verfiy the data were generated in | * Verfiy the data were generated in N years. Possible abnormal values: {{code | code = 0001-01 00:00:00}} occurred in MySQL {{code | code = datetime}} type. | ||
* Verfiy the data were not newer than today | * Verfiy the data were not newer than today | ||
* Verfiy the year of data were not {{kbd | key=1900}} if the data were imported from Microsoft Excel file. Datevalue<ref>[https://support.microsoft.com/zh-tw/office/datevalue-%E5%87%BD%E6%95%B8-df8b07d4-7761-4a93-bc33-b7471bbff252 DATEVALUE 函數 - Office 支援]</ref> was started from the year {{kbd | key=1900}}. | * Verfiy the year of data were not {{kbd | key=1900}} if the data were imported from Microsoft Excel file. Datevalue<ref>[https://support.microsoft.com/zh-tw/office/datevalue-%E5%87%BD%E6%95%B8-df8b07d4-7761-4a93-bc33-b7471bbff252 DATEVALUE 函數 - Office 支援]</ref> was started from the year {{kbd | key=1900}} e.g. | ||
** {{code | code = 1900/1/0}} (converted time formatted value from 0), | |||
** {{code | code = 1900/1/1}} (converted time formatted value from 1) | |||
* Verfiy the value of data were not {{kbd | key=0000-00-00 00:00:00}} | * Verfiy the value of data were not {{kbd | key=0000-00-00 00:00:00}} | ||
* Verfiy the diversity of data values e.g. [https://en.wikipedia.org/wiki/Variance Variance] | |||
Find the normal values: | Find the normal values: | ||
Line 529: | Line 527: | ||
=== Fix garbled message text === | === Fix garbled message text === | ||
[[Fix garbled message text]] | [[Fix garbled message text]] | ||
== Tools == | |||
* [https://github.com/IvanMathy/Boop IvanMathy/Boop: A scriptable scratchpad for developers. In slow yet steady progress.] ([https://apps.apple.com/us/app/boop/id1518425043?mt=12 Boop on the Mac App Store]) " ... to paste some plain text and run some basic text operations on it. " | |||
== Further reading == | == Further reading == | ||
Line 546: | Line 547: | ||
[[Category:Data Science]] | [[Category:Data Science]] | ||
[[Category:MySQL]] | [[Category:MySQL]] | ||
[[Category: | [[Category:String manipulation]] |