14,982
edits
| (6 intermediate revisions by the same user not shown) | |||
| Line 2: | Line 2: | ||
== Check list == | == Check list == | ||
* | * Row count: The number of data entries is a fundamental item for data verification and is easy to observe and check. For instance, one can compare the number of entries displayed on a webpage to the number of entries after exporting to a CSV file. | ||
* | * Duplicate data | ||
== Check if field value was not fulfilled == | == Check if field value was not fulfilled == | ||
| Line 315: | Line 315: | ||
* Check if a value is integer e.g. 1234567 | * Check if a value is integer e.g. 1234567 | ||
** Find the records which the value of `my_column` is numeric values entirely {{code | code = SELECT * FROM `my_table` WHERE `my_column` REGEXP '^[0-9]+$'}}<ref>[http://stackoverflow.com/questions/14343767/mysql-regexp-with-and-numbers-only regex - Mysql REGEXP with . and numbers only - Stack Overflow]</ref><ref>[https://stackoverflow.com/questions/75704/how-do-i-check-to-see-if-a-value-is-an-integer-in-mysql How do I check to see if a value is an integer in MySQL? - Stack Overflow]</ref> | ** Find the records which the value of `my_column` is numeric values entirely {{code | code = SELECT * FROM `my_table` WHERE `my_column` REGEXP '^[0-9]+$'}}<ref>[http://stackoverflow.com/questions/14343767/mysql-regexp-with-and-numbers-only regex - Mysql REGEXP with . and numbers only - Stack Overflow]</ref><ref>[https://stackoverflow.com/questions/75704/how-do-i-check-to-see-if-a-value-is-an-integer-in-mysql How do I check to see if a value is an integer in MySQL? - Stack Overflow]</ref> | ||
* Find the records which the value of `my_column` is not exactly 8 digits {{code | code = SELECT * FROM my_table WHERE LENGTH(my_column) != 8 OR my_column NOT REGEXP '^[0-9]{8}$'}} | |||
** The `LENGTH()` function checks if the string length is not 8 characters | |||
** The `REGEXP '^[0-9]{8}$'` pattern validates that the value contains exactly 8 digits from start (^) to end ($) | |||
** Using both conditions ensures catching values with correct length but non-numeric characters, as well as incorrect lengths | |||
* Check if a value is integer which may contains comma and dot symbols e.g. 1,234.567 or 3.414 | * Check if a value is integer which may contains comma and dot symbols e.g. 1,234.567 or 3.414 | ||
| Line 346: | Line 351: | ||
=== Time data: Data was generated in N years === | === Time data: Data was generated in N years === | ||
Define the abnormal values of the time data ([http://en.wikipedia.org/wiki/Time_series time series]) | Define the abnormal values of the time data ([http://en.wikipedia.org/wiki/Time_series time series]) | ||
* | * Verify the data were generated in N years. Possible abnormal values: {{code | code = 0001-01 00:00:00}} occurred in MySQL {{code | code = datetime}} type. e.g. | ||
* | * Verify the data were not newer than today | ||
* | * Verify the year of data were not {{kbd | key=1900}} if the data were imported from Microsoft Excel file. Datevalue<ref>[https://support.microsoft.com/zh-tw/office/datevalue-%E5%87%BD%E6%95%B8-df8b07d4-7761-4a93-bc33-b7471bbff252 DATEVALUE 函數 - Office 支援]</ref> was started from the year {{kbd | key=1900}} e.g. | ||
** {{code | code = 1900/1/0}} (converted time formatted value from 0), | ** {{code | code = 1900/1/0}} (converted time formatted value from 0), | ||
** {{code | code = 1900/1/1}} (converted time formatted value from 1) | ** {{code | code = 1900/1/1}} (converted time formatted value from 1) | ||
* | * Verify the diversity of data values e.g. [https://en.wikipedia.org/wiki/Variance Variance] | ||
Find the normal values: | Find the normal values: | ||
| Line 364: | Line 369: | ||
** {{code | code = SELECT * FROM `my_table` WHERE ( `my_time_column` >= CURDATE() - INTERVAL 10 YEAR ) AND ( `my_time_column` <= CURRENT_TIMESTAMP);}} | ** {{code | code = SELECT * FROM `my_table` WHERE ( `my_time_column` >= CURDATE() - INTERVAL 10 YEAR ) AND ( `my_time_column` <= CURRENT_TIMESTAMP);}} | ||
*** You need to check the {{code | code = SELECT CURRENT_TIMESTAMP);}} if correct or not before you delete the abnormal data (timezone issue) | *** You need to check the {{code | code = SELECT CURRENT_TIMESTAMP);}} if correct or not before you delete the abnormal data (timezone issue) | ||
Abnormal values | |||
* {{code | code = 1970-01-01 08:00:00}} (converted time formatted value from {{code | code =August 3, 2017}}) caused by the string contains special characters e.g. [https://en.wikipedia.org/wiki/Left-to-right_mark left-to-right mark (LRM) ] | |||
Check if the date valid | Check if the date valid | ||
| Line 386: | Line 394: | ||
* [[Return symbol]] | * [[Return symbol]] | ||
* [http://www.fileformat.info/info/unicode/char/a0/index.htm Unicode Character 'NO-BREAK SPACE' (U+00A0)] | * [http://www.fileformat.info/info/unicode/char/a0/index.htm Unicode Character 'NO-BREAK SPACE' (U+00A0)] | ||
* [https://www.fileformat.info/info/unicode/char/200f/index.htm Unicode Character 'RIGHT-TO-LEFT MARK' (U+200F)] | |||
* [https://www.fileformat.info/info/unicode/char/200f/index.htm Unicode Character 'RIGHT-TO-LEFT MARK' (U+200F)]<ref>[https://stackoverflow.com/questions/1930009/how-to-strip-unicode-chars-left-to-right-mark-from-a-string-in-php regex - How to strip unicode chars (LEFT_TO_RIGHT_MARK) from a string in php - Stack Overflow]</ref> | |||
== File Validation == | == File Validation == | ||
| Line 449: | Line 459: | ||
== Tools == | == Tools == | ||
* {{Mac}} [https://github.com/IvanMathy/Boop IvanMathy/Boop: A scriptable scratchpad for developers. In slow yet steady progress.] ([https://apps.apple.com/us/app/boop/id1518425043?mt=12 Boop on the Mac App Store]) " ... to paste some plain text and run some basic text operations on it. " | * {{Mac}} [https://github.com/IvanMathy/Boop IvanMathy/Boop: A scriptable scratchpad for developers. In slow yet steady progress.] ([https://apps.apple.com/us/app/boop/id1518425043?mt=12 Boop on the Mac App Store]) " ... to paste some plain text and run some basic text operations on it. " | ||
* [https://gchq.github.io/CyberChef/ CyberChef] | * [https://gchq.github.io/CyberChef/ CyberChef] (source code available on [https://github.com/gchq/CyberChef github]) The Cyber Swiss Army Knife - a web app for encryption, encoding, compression and data analysis | ||
== Further reading == | == Further reading == | ||
| Line 462: | Line 472: | ||
== References == | == References == | ||
<references/> | <references/> | ||
{{Template:Data factory flow}} | |||
[[Category:Spreadsheet]] [[Category:Excel]] | [[Category:Spreadsheet]] [[Category:Excel]] | ||