Find and remove duplicates: Difference between revisions

Jump to navigation Jump to search
m
No edit summary
Line 20: Line 20:


=== Remove duplicate values in Cygwin/BASH ===
=== Remove duplicate values in Cygwin/BASH ===
* [http://www.gnu.org/software/coreutils/manual/html_node/sort-invocation.html GNU Coreutils: sort invocation] OS: {{Linux}}, cygwin of {{Win}}. More details on [[Alternative_Linux_commands#Merge_multiple_plain_text_files | Merge multiple plain text files]].
** To remove duplicate lines:
*** {{kbd | key=<nowiki>sort -us -o <output_unique.file> <input.file></nowiki>}} in a large text file (GB)<ref>[http://unix.stackexchange.com/questions/19641/how-to-remove-duplicate-lines-in-a-large-multi-gb-textfile linux - How to remove duplicate lines in a large multi-GB textfile? - Unix & Linux Stack Exchange]</ref>
*** {{kbd | key=<nowiki>cat <input.file> | grep <pattern> | sort | uniq</nowiki>}} Processes text line by line and prints the '''unique''' lines which match a specified pattern. Equal to these steps: (1) {{kbd | key=<nowiki>cat <input.file> | grep <pattern> > <tmp.file></nowiki>}} (2) {{kbd | key=<nowiki>sort <tmp.file> | uniq</nowiki>}}
** Ignore first n line(s) & remove duplicate lines<ref>[https://stackoverflow.com/questions/14562423/is-there-a-way-to-ignore-header-lines-in-a-unix-sort sorting - Is there a way to ignore header lines in a UNIX sort? - Stack Overflow]</ref><ref>[http://linux.vbird.org/linux_basic/0320bash.php#redirect_com 命令執行的判斷依據: ; , &&, ||]</ref><ref>[https://www.computerhope.com/unix/utail.htm Linux tail command help and examples]</ref>
*** (1) ignore first one line: {{kbd | key=<nowiki>(head -n 1 <file> && tail -n +2 <file> | sort -us) > newfile</nowiki>}} 
*** (2) ignore first two lines: {{kbd | key=<nowiki>(head -n 2 <file> && tail -n +3 <file> | sort -us) > newfile</nowiki>}}


[http://www.gnu.org/software/coreutils/manual/html_node/sort-invocation.html GNU Coreutils: sort invocation] OS: {{Linux}}, cygwin of {{Win}}.
case 1: To remove duplicate lines of entire paragraph
* {{kbd | key=<nowiki>sort -us -o <output_unique.file> <input.file></nowiki>}} in a large text file (GB)<ref>[http://unix.stackexchange.com/questions/19641/how-to-remove-duplicate-lines-in-a-large-multi-gb-textfile linux - How to remove duplicate lines in a large multi-GB textfile? - Unix & Linux Stack Exchange]</ref>
* {{kbd | key=<nowiki>cat <input.file> | grep <pattern> | sort | uniq</nowiki>}} Processes text line by line and prints the '''unique''' lines which match a specified pattern. Equal to these steps: (1) {{kbd | key=<nowiki>cat <input.file> | grep <pattern> > <tmp.file></nowiki>}} (2) {{kbd | key=<nowiki>sort <tmp.file> | uniq</nowiki>}}
Case 2: Ignore first n line(s) & remove duplicate lines<ref>[https://stackoverflow.com/questions/14562423/is-there-a-way-to-ignore-header-lines-in-a-unix-sort sorting - Is there a way to ignore header lines in a UNIX sort? - Stack Overflow]</ref><ref>[http://linux.vbird.org/linux_basic/0320bash.php#redirect_com 命令執行的判斷依據: ; , &&, ||]</ref><ref>[https://www.computerhope.com/unix/utail.htm Linux tail command help and examples]</ref>
* (1) ignore first one line: {{kbd | key=<nowiki>(head -n 1 <file> && tail -n +2 <file> | sort -us) > newfile</nowiki>}} 
* (2) ignore first two lines: {{kbd | key=<nowiki>(head -n 2 <file> && tail -n +3 <file> | sort -us) > newfile</nowiki>}}
Relate pages:
* [[Alternative_Linux_commands#Merge_multiple_plain_text_files | Merge multiple plain text files]]


== Find and remove duplicates in MySQL ==
== Find and remove duplicates in MySQL ==
Anonymous user

Navigation menu