Data cleaning: Difference between revisions

Jump to navigation Jump to search
147 bytes added ,  3 October 2016
Line 325: Line 325:
** "{{kbd | key = UNION}} removes duplicates, whereas {{kbd | key = UNION ALL}} does not." source: [http://stackoverflow.com/questions/49925/what-is-the-difference-between-union-and-union-all sql - What is the difference between UNION and UNION ALL? - Stack Overflow]
** "{{kbd | key = UNION}} removes duplicates, whereas {{kbd | key = UNION ALL}} does not." source: [http://stackoverflow.com/questions/49925/what-is-the-difference-between-union-and-union-all sql - What is the difference between UNION and UNION ALL? - Stack Overflow]
* [http://www.gnu.org/software/coreutils/manual/html_node/sort-invocation.html GNU Coreutils: sort invocation] ex: {{kbd | key=sort -us -o output_unique.file input.file}} to remove duplicate lines in a large text file (GB)<ref>[http://unix.stackexchange.com/questions/19641/how-to-remove-duplicate-lines-in-a-large-multi-gb-textfile linux - How to remove duplicate lines in a large multi-GB textfile? - Unix & Linux Stack Exchange]</ref> OS: {{Linux}}, cygwin of {{Win}}
* [http://www.gnu.org/software/coreutils/manual/html_node/sort-invocation.html GNU Coreutils: sort invocation] ex: {{kbd | key=sort -us -o output_unique.file input.file}} to remove duplicate lines in a large text file (GB)<ref>[http://unix.stackexchange.com/questions/19641/how-to-remove-duplicate-lines-in-a-large-multi-gb-textfile linux - How to remove duplicate lines in a large multi-GB textfile? - Unix & Linux Stack Exchange]</ref> OS: {{Linux}}, cygwin of {{Win}}
* Google spreadsheet add-on: [https://www.ablebits.com/google-sheets-add-ons/remove-duplicates/howto.php Remove Duplicates for Google Sheets help]


=== counting number of duplicate occurrence ===
=== counting number of duplicate occurrence ===

Navigation menu