Data cleaning: Difference between revisions
Jump to navigation
Jump to search
→Data Validation
| Line 340: | Line 340: | ||
** Return 1 if the cell value is (1) Numbers (2) Numbers that are stored as text e.g. {{code | code = <nowiki>="5"</nowiki>}} | ** Return 1 if the cell value is (1) Numbers (2) Numbers that are stored as text e.g. {{code | code = <nowiki>="5"</nowiki>}} | ||
** Return 0 if the cell value is (1) Text (2) Numbers in scientific (exponential) notation e.g. {{code | code = <nowiki>1.23E+16</nowiki>}} (3) Decimal numbers e.g. {{code | code = <nowiki>3.141592654</nowiki>}} (4) Negative numbers | ** Return 0 if the cell value is (1) Text (2) Numbers in scientific (exponential) notation e.g. {{code | code = <nowiki>1.23E+16</nowiki>}} (3) Decimal numbers e.g. {{code | code = <nowiki>3.141592654</nowiki>}} (4) Negative numbers | ||
=== Time data: Validate the data format === | === Time data: Validate the data format === | ||
| Line 389: | Line 386: | ||
* [[Return symbol]] | * [[Return symbol]] | ||
* [http://www.fileformat.info/info/unicode/char/a0/index.htm Unicode Character 'NO-BREAK SPACE' (U+00A0)] | * [http://www.fileformat.info/info/unicode/char/a0/index.htm Unicode Character 'NO-BREAK SPACE' (U+00A0)] | ||
== File Validation == | |||
=== Verify the file format of downloaded file === | |||
* PDF file format: [https://stackoverflow.com/questions/16152583/tell-if-a-file-is-pdf-in-bash Tell if a file is PDF in bash - Stack Overflow] | |||
== Find and remove duplicates == | == Find and remove duplicates == | ||