Find and remove duplicates: Difference between revisions

Find and remove duplicates (edit)

Revision as of 14:28, 26 April 2022

356 bytes added , 26 April 2022

no edit summary

Anonymous user

Unknown user

@@ Line 1: / Line 1: @@
-=== Find duplicate data ===
+== Find and remove duplicates in Excel ==
-==== EXCEL ====
+=== Find duplicates in Excel ===
-===== Finding duplicate rows that differ in one column =====
+Finding duplicate rows that differ in one column
 * one column data: [http://www.extendoffice.com/documents/excel/1499-count-duplicate-values-in-column.html How to count duplicate values in a column in Excel?] Using {{kbd | key = COUNTIF(range, criteria)}} {{access | date = 2015-08-25}} or using '''Pivot Tables'''(樞紐分析表)  to find the occurrence of value >= 2
-===== Finding duplicate rows that differ in multiple columns =====
+Finding duplicate rows that differ in multiple columns
 * two or multiple columns data: (approach 1) [https://support.microsoft.com/en-us/kb/213367 How to compare data in two columns to find duplicates in Excel] {{access | date = 2015-06-16}} {{exclaim}} It may costs too much time (larger than one hour) if the number of records exceeds 1,000,000 (approach 2) Using [https://support.office.com/en-us/article/concat-function-9b1a9a3f-94ff-41af-9736-694cbd6b4ca2 CONCAT function] to concatenate two or multiple columns data. And then use {{kbd | key = COUNTIF(range, criteria)}}.
-==== Cygwin ====
+Counging
+* [http://superuser.com/questions/307837/how-to-count-number-of-repeat-occurrences microsoft excel - How to count number of repeat occurrences - Super User] {{exclaim}} long number issue: [https://superuser.com/questions/783840/countif-incorrectly-matches-long-number microsoft excel - Countif incorrectly matches long number - Super User]
+=== Remove duplicates in Excel ===
+* EXCEL: Data Tools -> Remove Duplicates: [https://support.office.com/en-us/article/Filter-for-unique-values-or-remove-duplicate-values-d6549cf0-357a-4acf-9df5-ca507915b704 Filter for unique values or remove duplicate values] {{access | date = 2015-10-20}}
+== Find and remove duplicates in Cygwin/BASH ==
+=== Finding duplicate values in Cygwin/BASH ===
 * [https://www.computerhope.com/unix/uuniq.htm uniq command] on Cygwin of {{Win}} or {{Linux}}: {{kbd | key=<nowiki>uniq -d <file.txt> > <duplicated_items.txt></nowiki>}}<ref>[https://unix.stackexchange.com/questions/52534/how-to-print-only-the-duplicate-values-from-a-text-file shell - How to print only the duplicate values from a text file? - Unix & Linux Stack Exchange]</ref>
-==== MySQL ====
-===== Finding duplicate rows that differ in one column =====
+=== Remove duplicate values in Cygwin/BASH ===
-Find the duplicated data for one column<ref>[http://stackoverflow.com/questions/688549/finding-duplicate-values-in-mysql?rq=1 Finding duplicate values in MySQL - Stack Overflow]</ref>
+* [http://www.gnu.org/software/coreutils/manual/html_node/sort-invocation.html GNU Coreutils: sort invocation] OS: {{Linux}}, cygwin of {{Win}}. More details on [[Alternative_Linux_commands#Merge_multiple_plain_text_files | Merge multiple plain text files]].
+** To remove duplicate lines:
+*** {{kbd | key=<nowiki>sort -us -o <output_unique.file> <input.file></nowiki>}} in a large text file (GB)<ref>[http://unix.stackexchange.com/questions/19641/how-to-remove-duplicate-lines-in-a-large-multi-gb-textfile linux - How to remove duplicate lines in a large multi-GB textfile? - Unix & Linux Stack Exchange]</ref>
+*** {{kbd | key=<nowiki>cat <input.file> | grep <pattern> | sort | uniq</nowiki>}} Processes text line by line and prints the '''unique''' lines which match a specified pattern. Equal to these steps: (1) {{kbd | key=<nowiki>cat <input.file> | grep <pattern> > <tmp.file></nowiki>}} (2) {{kbd | key=<nowiki>sort <tmp.file> | uniq</nowiki>}}
+** Ignore first n line(s) & remove duplicate lines<ref>[https://stackoverflow.com/questions/14562423/is-there-a-way-to-ignore-header-lines-in-a-unix-sort sorting - Is there a way to ignore header lines in a UNIX sort? - Stack Overflow]</ref><ref>[http://linux.vbird.org/linux_basic/0320bash.php#redirect_com 命令執行的判斷依據： ; , &&, ||]</ref><ref>[https://www.computerhope.com/unix/utail.htm Linux tail command help and examples]</ref>
+*** (1) ignore first one line: {{kbd | key=<nowiki>(head -n 1 <file> && tail -n +2 <file> | sort -us) > newfile</nowiki>}}
+*** (2) ignore first two lines: {{kbd | key=<nowiki>(head -n 2 <file> && tail -n +3 <file> | sort -us) > newfile</nowiki>}}
+== Find and remove duplicates in MySQL ==
+=== Find duplicates in MySQL ===
+Finding duplicate value that differ in one column<ref>[http://stackoverflow.com/questions/688549/finding-duplicate-values-in-mysql?rq=1 Finding duplicate values in MySQL - Stack Overflow]</ref>
 <pre>
 -- Generate test data.
@@ Line 41: / Line 60: @@
 </pre>
-===== Finding duplicate rows that differ in multiple columns =====
+Finding duplicate rows that differ in multiple columns: Using {{kbd | key =CONCAT}} for multiple columns ex: column_1, column_2
-Using {{kbd | key =CONCAT}} for multiple columns ex: column_1, column_2
 <pre>
 SELECT count(*) count, CONCAT(  `column_1`, `column_2`  ) 'key'
@@ Line 61: / Line 79: @@
 </pre>
-===== other cases =====
+Select deduplicated records
+* [http://www.mysqltutorial.org/mysql-distinct.aspx MySQL DISTINCT - Eliminate Duplicate Rows in a Result Set]. Using {{kbd | key =GROUP_CONCAT}} to handle the multiple columns<ref>[http://stackoverflow.com/questions/12188027/mysql-select-distinct-multiple-columns sql - MySQL SELECT DISTINCT multiple columns - Stack Overflow]</ref>
+* [http://www.w3schools.com/sql/sql_unique.asp SQL UNIQUE Constraint] "Note that you can have many UNIQUE constraints per table, but only one PRIMARY KEY constraint per table." Quoted from w3schools webpage.
+* "{{kbd | key = UNION}} removes duplicates, whereas {{kbd | key = UNION ALL}} does not." source: [http://stackoverflow.com/questions/49925/what-is-the-difference-between-union-and-union-all sql - What is the difference between UNION and UNION ALL? - Stack Overflow]
 For counting purpose: find the count of repeated id (type: int) between table_a and table_b
 <pre>
@@ Line 70: / Line 93: @@
 </pre>
-==== Google Spreadsheet ====
+Counting number of duplicate occurrence
+MySQL: find the number of duplicate occurrence between list_a & list_b which using the same primary key: column name {{kbd | key = id}}
+* {{kbd | key = SELECT count(DISTINCT(`id`)) FROM `list_a` WHERE `id` IN (SELECT DISTINCT(`id`) FROM `list_b`) ; }}
-* [https://www.ablebits.com/google-sheets-add-ons/remove-duplicates/index.php Remove duplicates in Google Sheets] 30 days free {{access | date = 2019-02-26}}
-* [https://chrome.google.com/webstore/detail/power-tools/dofhceeoedodcaheeoacmadcpegkjobi Power Tools] for Google Spreadsheet {{access | date = 2019-02-26}}
-** Menu: Data -> Remove duplicates
-=== Deduplicate ===
+=== Remove duplicates in MySQL ===
-* EXCEL: Data Tools -> Remove Duplicates: [https://support.office.com/en-us/article/Filter-for-unique-values-or-remove-duplicate-values-d6549cf0-357a-4acf-9df5-ca507915b704 Filter for unique values or remove duplicate values] {{access | date = 2015-10-20}}
-* PHP: [http://php.net/manual/en/function.array-unique.php PHP: array_unique], [http://php.net/manual/en/function.array-intersect.php PHP: array_intersect]
+* [http://stackoverflow.com/questions/4685173/delete-all-duplicate-rows-except-for-one-in-mysql sql - Delete all Duplicate Rows except for One in MySQL? - Stack Overflow]
-* MySQL: select deduplicated records
-** [http://www.mysqltutorial.org/mysql-distinct.aspx MySQL DISTINCT - Eliminate Duplicate Rows in a Result Set]. Using {{kbd | key =GROUP_CONCAT}} to handle the multiple columns<ref>[http://stackoverflow.com/questions/12188027/mysql-select-distinct-multiple-columns sql - MySQL SELECT DISTINCT multiple columns - Stack Overflow]</ref>
-** [http://www.w3schools.com/sql/sql_unique.asp SQL UNIQUE Constraint] "Note that you can have many UNIQUE constraints per table, but only one PRIMARY KEY constraint per table." Quoted from w3schools webpage.
-** "{{kbd | key = UNION}} removes duplicates, whereas {{kbd | key = UNION ALL}} does not." source: [http://stackoverflow.com/questions/49925/what-is-the-difference-between-union-and-union-all sql - What is the difference between UNION and UNION ALL? - Stack Overflow]
-* MySQL: delete duplicated records
-** [http://stackoverflow.com/questions/4685173/delete-all-duplicate-rows-except-for-one-in-mysql sql - Delete all Duplicate Rows except for One in MySQL? - Stack Overflow]
-* [http://www.gnu.org/software/coreutils/manual/html_node/sort-invocation.html GNU Coreutils: sort invocation] OS: {{Linux}}, cygwin of {{Win}}. More details on [[Alternative_Linux_commands#Merge_multiple_plain_text_files | Merge multiple plain text files]].
+== Find and remove duplicates in Google Spreadsheet ==
-** To remove duplicate lines:
-*** {{kbd | key=<nowiki>sort -us -o <output_unique.file> <input.file></nowiki>}} in a large text file (GB)<ref>[http://unix.stackexchange.com/questions/19641/how-to-remove-duplicate-lines-in-a-large-multi-gb-textfile linux - How to remove duplicate lines in a large multi-GB textfile? - Unix & Linux Stack Exchange]</ref>
-*** {{kbd | key=<nowiki>cat <input.file> | grep <pattern> | sort | uniq</nowiki>}} Processes text line by line and prints the '''unique''' lines which match a specified pattern. Equal to these steps: (1) {{kbd | key=<nowiki>cat <input.file> | grep <pattern> > <tmp.file></nowiki>}} (2) {{kbd | key=<nowiki>sort <tmp.file> | uniq</nowiki>}}
-** Ignore first n line(s) & remove duplicate lines<ref>[https://stackoverflow.com/questions/14562423/is-there-a-way-to-ignore-header-lines-in-a-unix-sort sorting - Is there a way to ignore header lines in a UNIX sort? - Stack Overflow]</ref><ref>[http://linux.vbird.org/linux_basic/0320bash.php#redirect_com 命令執行的判斷依據： ; , &&, ||]</ref><ref>[https://www.computerhope.com/unix/utail.htm Linux tail command help and examples]</ref>
-*** (1) ignore first one line: {{kbd | key=<nowiki>(head -n 1 <file> && tail -n +2 <file> | sort -us) > newfile</nowiki>}}
-*** (2) ignore first two lines: {{kbd | key=<nowiki>(head -n 2 <file> && tail -n +3 <file> | sort -us) > newfile</nowiki>}}
+* [https://www.ablebits.com/google-sheets-add-ons/remove-duplicates/index.php Remove duplicates in Google Sheets] 30 days free {{access | date = 2019-02-26}}
+* [https://chrome.google.com/webstore/detail/power-tools/dofhceeoedodcaheeoacmadcpegkjobi Power Tools] for Google Spreadsheet {{access | date = 2019-02-26}}
+** Menu: Data -> Remove duplicates
 * Google spreadsheet add-on: [https://www.ablebits.com/google-sheets-add-ons/remove-duplicates/howto.php Remove Duplicates for Google Sheets help]
+== Find and remove duplicates in PHP ==
+=== Find duplicates in PHP ===
+* PHP [http://php.net/manual/en/function.array-intersect.php PHP: array_intersect] function
+=== Remove duplicates in PHP ===
+* PHP: [http://php.net/manual/en/function.array-unique.php PHP: array_unique] function
+== Find and remove duplicates in JavaScript ==
+=== Remove duplicates in JavaScript ===
 * JavaScript: [https://www.delftstack.com/howto/javascript/javascript-remove-duplicates-from-an-array/ Remove Duplicates From an Array in JavaScript | Delft Stack]
-=== Counting number of duplicate occurrence ===
-MySQL: find the number of duplicate occurrence between list_a & list_b which using the same primary key: column name {{kbd | key = id}}
-* {{kbd | key = SELECT count(DISTINCT(`id`)) FROM `list_a` WHERE `id` IN (SELECT DISTINCT(`id`) FROM `list_b`) ; }}
-Excel:
-* [http://superuser.com/questions/307837/how-to-count-number-of-repeat-occurrences microsoft excel - How to count number of repeat occurrences - Super User] {{exclaim}} long number issue: [https://superuser.com/questions/783840/countif-incorrectly-matches-long-number microsoft excel - Countif incorrectly matches long number - Super User]
-=== Other ===
+== Other issues ==
 * symbol e.g. data-mining or data_mining

Find and remove duplicates: Difference between revisions

Find and remove duplicates (edit)

Revision as of 14:28, 26 April 2022

Navigation menu

Search