Data cleaning: Difference between revisions

Jump to navigation Jump to search
5,765 bytes removed ,  25 November 2025
(→‎Numeric: Check if the column value is integer)
 
(28 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Check list ==
* Row count: The number of data entries is a fundamental item for data verification and is easy to observe and check. For instance, one can compare the number of entries displayed on a webpage to the number of entries after exporting to a CSV file.
* Duplicate data


== Check if field value was not fulfilled  ==
== Check if field value was not fulfilled  ==
Line 68: Line 73:
</table>
</table>


=== by datatype ===
=== By datatype ===
==== VARCHAR and NOT allows NULL value ====
Using NULLIF() function<ref>[https://www.w3schools.com/sql/func_mysql_nullif.asp MySQL NULLIF() Function]</ref>
 
SQL query:
<pre>
SELECT NULLIF(TRIM(`my_column`), "")
</pre>
 
Example result:
 
<pre>
SELECT NULLIF(null, "");
-- return NULL
 
SELECT NULLIF("", "");
-- return NULL
 
SELECT NULLIF(TRIM("  "), "");
-- return NULL
 
SELECT NULLIF(TRIM("not empty string  "), "");
-- return "not empty string"
 
</pre>
 
 
==== VARCHAR and allows NULL value ====
==== VARCHAR and allows NULL value ====
<table border="1" style="width: 100%">
<table border="1" style="width: 100%">
Line 237: Line 268:
Validate the format of field value. Related page: [[Regular expression]]
Validate the format of field value. Related page: [[Regular expression]]


=== Email contains @ symbol ===
=== Verify the strings are in valid email format ===
Rule: Email contains @ symbol
 
* EXCEL: {{kbd | key =<nowiki>=IF(ISERR(FIND("@", A2, 1)), FALSE, TRUE)</nowiki>}} only check the field if contains @ symbol or not
* EXCEL: {{kbd | key =<nowiki>=IF(ISERR(FIND("@", A2, 1)), FALSE, TRUE)</nowiki>}} only check the field if contains @ symbol or not
** result: (1) normal condition: return TRUE; (2) exceptional condition: return '''FALSE''' if @ symbol was not found  
** result: (1) normal condition: return TRUE; (2) exceptional condition: return '''FALSE''' if @ symbol was not found  
Line 245: Line 278:
* PHP: [http://www.w3schools.com/php/filter_validate_email.asp PHP FILTER_VALIDATE_EMAIL Filter]
* PHP: [http://www.w3schools.com/php/filter_validate_email.asp PHP FILTER_VALIDATE_EMAIL Filter]
** "Returns the filtered data, or '''FALSE''' if the filter fails." quoted from [http://php.net/manual/en/function.filter-var.php PHP.net]
** "Returns the filtered data, or '''FALSE''' if the filter fails." quoted from [http://php.net/manual/en/function.filter-var.php PHP.net]
=== Verify the strings are in valid url format ===
Rule: Begin with http or https
* Google spreadsheet {{kbd | key =<nowiki>=REGEXMATCH(A1, "^http(s?)")</nowiki>}}


=== Number precision in Excel ===
=== Number precision in Excel ===
Line 258: Line 296:
* If the data was imported from Excel, you should notice the 15 digit precision issue.
* If the data was imported from Excel, you should notice the 15 digit precision issue.


=== Check if the column value is integer ===
=== Verify the column values are numeric ===
 
Possible values
 
<pre>
test data:
3.141592654
1.36184E+14
123,456.789
20740199601
346183773390240
="5"
</pre>
 
==== Verify if value is number in MySQL ====
MySQL:  
MySQL:  
* Find the records which the value of `my_column` is numeric values entirely {{code | code = SELECT * FROM `my_table` WHERE `my_column` REGEXP '^[0-9]+$'}}<ref>[http://stackoverflow.com/questions/14343767/mysql-regexp-with-and-numbers-only regex - Mysql REGEXP with . and numbers only - Stack Overflow]</ref>
 
* Find the records which the value of `my_column` is '''NOT''' numeric values entirely {{code | code = SELECT * FROM `my_table` WHERE `my_column` NOT REGEXP '^[0-9]+$'}}
* Check if a value is integer e.g. 1234567
** Find the records which the value of `my_column` is numeric values entirely {{code | code = SELECT * FROM `my_table` WHERE `my_column` REGEXP '^[0-9]+$'}}<ref>[http://stackoverflow.com/questions/14343767/mysql-regexp-with-and-numbers-only regex - Mysql REGEXP with . and numbers only - Stack Overflow]</ref><ref>[https://stackoverflow.com/questions/75704/how-do-i-check-to-see-if-a-value-is-an-integer-in-mysql How do I check to see if a value is an integer in MySQL? - Stack Overflow]</ref>
 
* Find the records which the value of `my_column` is not exactly 8 digits {{code | code = SELECT * FROM my_table WHERE LENGTH(my_column) != 8 OR my_column NOT REGEXP '^[0-9]{8}$'}}
** The `LENGTH()` function checks if the string length is not 8 characters
** The `REGEXP '^[0-9]{8}$'` pattern validates that the value contains exactly 8 digits from start (^) to end ($)
** Using both conditions ensures catching values with correct length but non-numeric characters, as well as incorrect lengths
 
* Check if a value is integer which may contains comma and dot symbols e.g. 1,234.567 or 3.414
** {{code | code = SELECT * FROM `my_table` WHERE `my_column` REGEXP '^[0-9,\.]+$'}}<ref>[https://community.denodo.com/answers/question/details?questionId=9060g000000XelhAAC&title=How+to+identify+if+values+in+a+column+is+numeric+%28+Function+similar+to+Isnumeric+is+SQL%29 How to identify if values in a column is numeric ( Function similar to Isnumeric is SQL)]</ref>
 
* Check if a value is NOT integer
** Find the records which the value of `my_column` is '''NOT''' numeric values entirely {{code | code = SELECT * FROM `my_table` WHERE `my_column` NOT REGEXP '^[0-9]+$'}}




Line 267: Line 331:
* The {{kbd | key=tax_id}} column is 8 digits only. Find the well-formatted {{kbd | key=tax_id}} records by using {{code | code = SELECT * FROM `tax_id` WHERE `tax_id` REGEXP '^[0-9]{8}$'}}
* The {{kbd | key=tax_id}} column is 8 digits only. Find the well-formatted {{kbd | key=tax_id}} records by using {{code | code = SELECT * FROM `tax_id` WHERE `tax_id` REGEXP '^[0-9]{8}$'}}


=== Check if the column value is numeric ===
==== Verify if value is number in PHP ====
List of the possible abnormal values:
* All numeric values are odd or even if the data were generated by user naturally.
 
PHP:
* [http://php.net/manual/en/function.is-numeric.php is_numeric]
 
MySQL:
* Find the records which the value of `my_column` is numeric values entirely {{code | code = SELECT * FROM `my_table` WHERE `my_column` REGEXP '^[0-9]+\.?[0-9]+$'}}<ref>[https://community.denodo.com/answers/question/details?questionId=9060g000000XelhAAC&title=How+to+identify+if+values+in+a+column+is+numeric+%28+Function+similar+to+Isnumeric+is+SQL%29 How to identify if values in a column is numeric ( Function similar to Isnumeric is SQL)]</ref>


* [http://php.net/manual/en/function.is-numeric.php is_numeric] function
* [https://www.php.net/manual/en/function.is-int.php is_int] function


==== Verify if value is number in Excel or Google sheet ====
Excel & [https://www.google.com/sheets/about/ Google Sheets]:  
Excel & [https://www.google.com/sheets/about/ Google Sheets]:  
* Using [http://www.techonthenet.com/excel/formulas/isnumber.php ISNUMBER Function]: {{code | code = <nowiki>=INT(ISNUMBER(A1))</nowiki>}}
* Using [http://www.techonthenet.com/excel/formulas/isnumber.php ISNUMBER Function]: {{code | code = <nowiki>=INT(ISNUMBER(A1))</nowiki>}}
Line 286: Line 345:
** Return 1 if the cell value is (1) Numbers (2) Numbers that are stored as text e.g. {{code | code = <nowiki>="5"</nowiki>}}
** Return 1 if the cell value is (1) Numbers (2) Numbers that are stored as text e.g. {{code | code = <nowiki>="5"</nowiki>}}
** Return 0 if the cell value is (1) Text (2) Numbers in scientific (exponential) notation e.g. {{code | code = <nowiki>1.23E+16</nowiki>}} (3) Decimal numbers e.g. {{code | code = <nowiki>3.141592654</nowiki>}} (4) Negative numbers
** Return 0 if the cell value is (1) Text (2) Numbers in scientific (exponential) notation e.g. {{code | code = <nowiki>1.23E+16</nowiki>}} (3) Decimal numbers e.g. {{code | code = <nowiki>3.141592654</nowiki>}} (4) Negative numbers
<pre>
test data:
3.141592654
1.36184E+14
20740199601
346183773390240
="5"
</pre>


=== Time data: Validate the data format ===
=== Time data: Validate the data format ===
Line 301: Line 351:
=== Time data: Data was generated in N years ===
=== Time data: Data was generated in N years ===
Define the abnormal values of the time data ([http://en.wikipedia.org/wiki/Time_series time series])
Define the abnormal values of the time data ([http://en.wikipedia.org/wiki/Time_series time series])
* Verfiy the data were generated in N years. Possible abnormal values: {{code | code = 0001-01 00:00:00}} occurred in MySQL {{code | code = datetime}} type.
* Verify the data were generated in N years. Possible abnormal values: {{code | code = 0001-01 00:00:00}} occurred in MySQL {{code | code = datetime}} type. e.g.
* Verfiy the data were not newer than today
 
* Verfiy the year of data were not {{kbd | key=1900}} if the data were imported from Microsoft Excel file. Datevalue<ref>[https://support.microsoft.com/zh-tw/office/datevalue-%E5%87%BD%E6%95%B8-df8b07d4-7761-4a93-bc33-b7471bbff252 DATEVALUE 函數 - Office 支援]</ref> was started from the year {{kbd | key=1900}} e.g.  
* Verify the data were not newer than today
 
* Verify the year of data were not {{kbd | key=1900}} if the data were imported from Microsoft Excel file. Datevalue<ref>[https://support.microsoft.com/zh-tw/office/datevalue-%E5%87%BD%E6%95%B8-df8b07d4-7761-4a93-bc33-b7471bbff252 DATEVALUE 函數 - Office 支援]</ref> was started from the year {{kbd | key=1900}} e.g.  
** {{code | code = 1900/1/0}} (converted time formatted value from 0),  
** {{code | code = 1900/1/0}} (converted time formatted value from 0),  
** {{code | code = 1900/1/1}} (converted time formatted value from 1)
** {{code | code = 1900/1/1}} (converted time formatted value from 1)
* Verfiy the value of data were not {{kbd | key=0000-00-00 00:00:00}}
 
* Verfiy the diversity of data values e.g. [https://en.wikipedia.org/wiki/Variance Variance]
* Verify the diversity of data values e.g. [https://en.wikipedia.org/wiki/Variance Variance]


Find the normal values:  
Find the normal values:  
Line 317: Line 369:
** {{code | code = SELECT * FROM `my_table` WHERE ( `my_time_column` >=  CURDATE() - INTERVAL 10 YEAR )  AND  ( `my_time_column` <= CURRENT_TIMESTAMP);}}  
** {{code | code = SELECT * FROM `my_table` WHERE ( `my_time_column` >=  CURDATE() - INTERVAL 10 YEAR )  AND  ( `my_time_column` <= CURRENT_TIMESTAMP);}}  
*** You need to check the {{code | code = SELECT CURRENT_TIMESTAMP);}} if correct or not before you delete the abnormal data (timezone issue)
*** You need to check the {{code | code = SELECT CURRENT_TIMESTAMP);}} if correct or not before you delete the abnormal data (timezone issue)
Abnormal values
* {{code | code = 1970-01-01 08:00:00}} (converted time formatted value from {{code | code =August 3, 2017}}) caused by the string contains special characters e.g. [https://en.wikipedia.org/wiki/Left-to-right_mark left-to-right mark (LRM) ]


Check if the date valid
Check if the date valid
Line 339: Line 394:
* [[Return symbol]]
* [[Return symbol]]
* [http://www.fileformat.info/info/unicode/char/a0/index.htm Unicode Character 'NO-BREAK SPACE' (U+00A0)]
* [http://www.fileformat.info/info/unicode/char/a0/index.htm Unicode Character 'NO-BREAK SPACE' (U+00A0)]
* [https://www.fileformat.info/info/unicode/char/200f/index.htm Unicode Character 'RIGHT-TO-LEFT MARK' (U+200F)]
* [https://www.fileformat.info/info/unicode/char/200f/index.htm Unicode Character 'RIGHT-TO-LEFT MARK' (U+200F)]<ref>[https://stackoverflow.com/questions/1930009/how-to-strip-unicode-chars-left-to-right-mark-from-a-string-in-php regex - How to strip unicode chars (LEFT_TO_RIGHT_MARK) from a string in php - Stack Overflow]</ref>


== Duplicate data ==
== File Validation ==
=== Find duplicate data ===
==== EXCEL ====
===== Finding duplicate rows that differ in one column =====
* one column data: [http://www.extendoffice.com/documents/excel/1499-count-duplicate-values-in-column.html How to count duplicate values in a column in Excel?] Using {{kbd | key = COUNTIF(range, criteria)}} {{access | date = 2015-08-25}} or using '''Pivot Tables'''(樞紐分析表)  to find the occurrence of value >= 2
 
===== Finding duplicate rows that differ in multiple columns =====
* two or multiple columns data: (approach 1) [https://support.microsoft.com/en-us/kb/213367 How to compare data in two columns to find duplicates in Excel] {{access | date = 2015-06-16}} {{exclaim}} It may costs too much time (larger than one hour) if the number of records exceeds 1,000,000 (approach 2) Using [https://support.office.com/en-us/article/concat-function-9b1a9a3f-94ff-41af-9736-694cbd6b4ca2 CONCAT function] to concatenate two or multiple columns data. And then use {{kbd | key = COUNTIF(range, criteria)}}.
 
==== Cygwin ====
* [https://www.computerhope.com/unix/uuniq.htm uniq command] on Cygwin of {{Win}} or {{Linux}}: {{kbd | key=<nowiki>uniq -d <file.txt> > <duplicated_items.txt></nowiki>}}<ref>[https://unix.stackexchange.com/questions/52534/how-to-print-only-the-duplicate-values-from-a-text-file shell - How to print only the duplicate values from a text file? - Unix & Linux Stack Exchange]</ref>
 
==== MySQL ====
===== Finding duplicate rows that differ in one column =====
Find the duplicated data for one column<ref>[http://stackoverflow.com/questions/688549/finding-duplicate-values-in-mysql?rq=1 Finding duplicate values in MySQL - Stack Overflow]</ref>
<pre>
-- Generate test data.
CREATE TABLE `table_name` (
  `id` int(11) NOT NULL,
  `content` varchar(5) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
 
INSERT INTO `table_name` (`id`, `content`) VALUES
(1, 'apple'),
(2, 'lemon'),
(3, 'apple');
 
ALTER TABLE `table_name`
  ADD PRIMARY KEY (`id`);


-- Find duplicated data
=== Verify the file format of downloaded file ===
SELECT `content`, COUNT(*) count
* PDF file format: [https://stackoverflow.com/questions/16152583/tell-if-a-file-is-pdf-in-bash Tell if a file is PDF in bash - Stack Overflow]
FROM `table_name`
GROUP BY `content`
HAVING count > 1;


SELECT tmp.* FROM
== Find and remove duplicates ==
(
[[Find and remove duplicates]] in Excel/BASH/MySQL/PHP
  SELECT `content`, count(*) count FROM `table_name` GROUP BY `content`
) tmp
WHERE tmp.count >1;
</pre>
 
===== Finding duplicate rows that differ in multiple columns =====
Using {{kbd | key =CONCAT}} for multiple columns ex: column_1, column_2
<pre>
SELECT count(*) count, CONCAT(  `column_1`, `column_2`  ) 'key'
FROM `table_name`
GROUP BY CONCAT(  `column_1`, `column_2`  )
HAVING count > 1;
</pre>
 
or
<pre>
SELECT tmp.key FROM
(
SELECT count(*) count, CONCAT(  `column_1`, `column_2`  ) 'key'
FROM `table_name`
GROUP BY CONCAT(  `column_1`, `column_2`  )
) tmp
WHERE tmp.count >=2
</pre>
 
===== other cases =====
For counting purpose: find the count of repeated id (type: int) between table_a and table_b
<pre>
SELECT count(DISTINCT(id)) FROM table_a WHERE id IN
(
  SELECT DISTINCT(id) FROM table_b
)
</pre>
 
==== Google Spreadsheet ====
 
* [https://www.ablebits.com/google-sheets-add-ons/remove-duplicates/index.php Remove duplicates in Google Sheets] 30 days free {{access | date = 2019-02-26}}
* [https://chrome.google.com/webstore/detail/power-tools/dofhceeoedodcaheeoacmadcpegkjobi Power Tools] for Google Spreadsheet {{access | date = 2019-02-26}}
** Menu: Data -> Remove duplicates
 
=== Deduplicate ===
* EXCEL: Data Tools -> Remove Duplicates: [https://support.office.com/en-us/article/Filter-for-unique-values-or-remove-duplicate-values-d6549cf0-357a-4acf-9df5-ca507915b704 Filter for unique values or remove duplicate values] {{access | date = 2015-10-20}}
 
* PHP: [http://php.net/manual/en/function.array-unique.php PHP: array_unique], [http://php.net/manual/en/function.array-intersect.php PHP: array_intersect]
 
* MySQL: select deduplicated records
** [http://www.mysqltutorial.org/mysql-distinct.aspx MySQL DISTINCT - Eliminate Duplicate Rows in a Result Set]. Using {{kbd | key =GROUP_CONCAT}} to handle the multiple columns<ref>[http://stackoverflow.com/questions/12188027/mysql-select-distinct-multiple-columns sql - MySQL SELECT DISTINCT multiple columns - Stack Overflow]</ref>
** [http://www.w3schools.com/sql/sql_unique.asp SQL UNIQUE Constraint] "Note that you can have many UNIQUE constraints per table, but only one PRIMARY KEY constraint per table." Quoted from w3schools webpage.
** "{{kbd | key = UNION}} removes duplicates, whereas {{kbd | key = UNION ALL}} does not." source: [http://stackoverflow.com/questions/49925/what-is-the-difference-between-union-and-union-all sql - What is the difference between UNION and UNION ALL? - Stack Overflow]
* MySQL: delete duplicated records
** [http://stackoverflow.com/questions/4685173/delete-all-duplicate-rows-except-for-one-in-mysql sql - Delete all Duplicate Rows except for One in MySQL? - Stack Overflow]
 
* [http://www.gnu.org/software/coreutils/manual/html_node/sort-invocation.html GNU Coreutils: sort invocation] OS: {{Linux}}, cygwin of {{Win}}. More details on [[Alternative_Linux_commands#Merge_multiple_plain_text_files | Merge multiple plain text files]].
** To remove duplicate lines:
*** {{kbd | key=<nowiki>sort -us -o <output_unique.file> <input.file></nowiki>}} in a large text file (GB)<ref>[http://unix.stackexchange.com/questions/19641/how-to-remove-duplicate-lines-in-a-large-multi-gb-textfile linux - How to remove duplicate lines in a large multi-GB textfile? - Unix & Linux Stack Exchange]</ref>
*** {{kbd | key=<nowiki>cat <input.file> | grep <pattern> | sort | uniq</nowiki>}} Processes text line by line and prints the '''unique''' lines which match a specified pattern. Equal to these steps: (1) {{kbd | key=<nowiki>cat <input.file> | grep <pattern> > <tmp.file></nowiki>}} (2) {{kbd | key=<nowiki>sort <tmp.file> | uniq</nowiki>}}
** Ignore first n line(s) & remove duplicate lines<ref>[https://stackoverflow.com/questions/14562423/is-there-a-way-to-ignore-header-lines-in-a-unix-sort sorting - Is there a way to ignore header lines in a UNIX sort? - Stack Overflow]</ref><ref>[http://linux.vbird.org/linux_basic/0320bash.php#redirect_com 命令執行的判斷依據: ; , &&, ||]</ref><ref>[https://www.computerhope.com/unix/utail.htm Linux tail command help and examples]</ref>
*** (1) ignore first one line: {{kbd | key=<nowiki>(head -n 1 <file> && tail -n +2 <file> | sort -us) > newfile</nowiki>}} 
*** (2) ignore first two lines: {{kbd | key=<nowiki>(head -n 2 <file> && tail -n +3 <file> | sort -us) > newfile</nowiki>}}
 
* Google spreadsheet add-on: [https://www.ablebits.com/google-sheets-add-ons/remove-duplicates/howto.php Remove Duplicates for Google Sheets help]
 
=== Counting number of duplicate occurrence ===
MySQL: find the number of duplicate occurrence between list_a & list_b which using the same primary key: column name {{kbd | key = id}}
* {{kbd | key = SELECT count(DISTINCT(`id`)) FROM `list_a` WHERE `id` IN (SELECT DISTINCT(`id`) FROM `list_b`) ; }}
 
Excel:
* [http://superuser.com/questions/307837/how-to-count-number-of-repeat-occurrences microsoft excel - How to count number of repeat occurrences - Super User] {{exclaim}} long number issue: [https://superuser.com/questions/783840/countif-incorrectly-matches-long-number microsoft excel - Countif incorrectly matches long number - Super User]
 
=== Other ===
* symbol e.g. data-mining or data_mining


== Counting ==
== Counting ==
Line 486: Line 441:
# ASCII Horizontal Tab (TAB) {{kbd | key=<nowiki>\t</nowiki>}}
# ASCII Horizontal Tab (TAB) {{kbd | key=<nowiki>\t</nowiki>}}
# ASCII Backspace {{kbd | key=<nowiki>\b</nowiki>}}
# ASCII Backspace {{kbd | key=<nowiki>\b</nowiki>}}
# [https://en.wikipedia.org/wiki/Non-breaking_space Non-breaking space] ({{kbd | key=<nowiki>nbsp;</nowiki>}}) Replace Non-breaking space with one whitespace using PHP: {{kbd | key=<nowiki>$result = str_replace("\xc2\xa0", ' ', $original_string);</nowiki>}}<ref>[https://stackoverflow.com/questions/40724543/how-to-replace-decoded-non-breakable-space-nbsp php - How to replace decoded Non-breakable space (nbsp) - Stack Overflow]</ref>
# [[Remove non breaking space]]
 
Sentence spacing
# [https://en.wikipedia.org/wiki/Sentence_spacing_in_digital_media Sentence spacing in digital media - Wikipedia] e.g. {{kbd | key=<nowiki>&Nbsp; &Ensp; &Emsp;</nowiki>}}
 
How to display the Non-breaking space In PHP?
<pre>
$input = '12345678' . hex2bin('c2a0');
echo $input . PHP_EOL;
## Result of above script: '12345678 ' (one whitespace at the end)
 
echo bin2hex($input) . PHP_EOL;
## Result of above script: 3132333435363738c2a0
 
echo bin2hex('12345678') . PHP_EOL;
## Result of above script: 3132333435363738 (You mat notice the difference of script result is C2A0)
 
</pre>
 
How to display the Non-breaking space In MySQL?
<pre>
SELECT CONCAT('12345678', UNHEX('C2A0'))
-- Result of above query: '12345678 ' (one whitespace at the end)
 
SELECT HEX(CONCAT('12345678', UNHEX('C2A0')))
-- Result of above query: 3132333435363738C2A0
 
SELECT HEX('12345678')
-- Result of above query: 3132333435363738 (You mat notice the difference of query result is C2A0)
 
SELECT LENGTH('12345678')
-- Result of above query: 8
 
SELECT LENGTH(CONCAT('12345678', UNHEX('C2A0')))
-- Result of above query: 10
</pre>


== Remove control character ==
== Remove control character ==
Line 530: Line 450:
$result = preg_replace('/[\x00-\x1F]/', $replacement, $input);
$result = preg_replace('/[\x00-\x1F]/', $replacement, $input);
</pre>
</pre>
== Remove tracking parameter from link ==
[[Remove tracking parameter from link]]


=== Fix garbled message text ===
=== Fix garbled message text ===
Line 535: Line 458:


== Tools ==
== Tools ==
* [https://github.com/IvanMathy/Boop IvanMathy/Boop: A scriptable scratchpad for developers. In slow yet steady progress.] ([https://apps.apple.com/us/app/boop/id1518425043?mt=12 ‎Boop on the Mac App Store]) " ... to paste some plain text and run some basic text operations on it. "
* {{Mac}} [https://github.com/IvanMathy/Boop IvanMathy/Boop: A scriptable scratchpad for developers. In slow yet steady progress.] ([https://apps.apple.com/us/app/boop/id1518425043?mt=12 ‎Boop on the Mac App Store]) " ... to paste some plain text and run some basic text operations on it. "
* [https://gchq.github.io/CyberChef/ CyberChef] (source code available on [https://github.com/gchq/CyberChef github])  The Cyber Swiss Army Knife - a web app for encryption, encoding, compression and data analysis


== Further reading ==
== Further reading ==
Line 546: Line 470:
* [https://medium.com/bryanyang0528/%E8%B3%87%E6%96%99%E5%93%81%E8%B3%AA%E5%88%9D%E6%8E%A2-data-quality-b765eb56a7c2?fbclid=IwAR3NBb2BtFm9O3FeY7JgQ5HLE5VG5nFe3m5Zx8zNW9XkvOUPlqV9hXmaoXI 資料品質初探(Data Quality) – 亂點技能的跨界人生 – Medium] {{access | date = 2018-01-13}}
* [https://medium.com/bryanyang0528/%E8%B3%87%E6%96%99%E5%93%81%E8%B3%AA%E5%88%9D%E6%8E%A2-data-quality-b765eb56a7c2?fbclid=IwAR3NBb2BtFm9O3FeY7JgQ5HLE5VG5nFe3m5Zx8zNW9XkvOUPlqV9hXmaoXI 資料品質初探(Data Quality) – 亂點技能的跨界人生 – Medium] {{access | date = 2018-01-13}}


== references ==
== References ==
<references/>
<references/>
{{Template:Data factory flow}}


[[Category:Spreadsheet]] [[Category:Excel]]
[[Category:Spreadsheet]] [[Category:Excel]]

Navigation menu