Find and remove duplicates: Difference between revisions
Jump to navigation
Jump to search
m
→Remove duplicate values in Cygwin/BASH
No edit summary |
|||
| Line 20: | Line 20: | ||
=== Remove duplicate values in Cygwin/BASH === | === Remove duplicate values in Cygwin/BASH === | ||
[http://www.gnu.org/software/coreutils/manual/html_node/sort-invocation.html GNU Coreutils: sort invocation] OS: {{Linux}}, cygwin of {{Win}}. | |||
case 1: To remove duplicate lines of entire paragraph | |||
* {{kbd | key=<nowiki>sort -us -o <output_unique.file> <input.file></nowiki>}} in a large text file (GB)<ref>[http://unix.stackexchange.com/questions/19641/how-to-remove-duplicate-lines-in-a-large-multi-gb-textfile linux - How to remove duplicate lines in a large multi-GB textfile? - Unix & Linux Stack Exchange]</ref> | |||
* {{kbd | key=<nowiki>cat <input.file> | grep <pattern> | sort | uniq</nowiki>}} Processes text line by line and prints the '''unique''' lines which match a specified pattern. Equal to these steps: (1) {{kbd | key=<nowiki>cat <input.file> | grep <pattern> > <tmp.file></nowiki>}} (2) {{kbd | key=<nowiki>sort <tmp.file> | uniq</nowiki>}} | |||
Case 2: Ignore first n line(s) & remove duplicate lines<ref>[https://stackoverflow.com/questions/14562423/is-there-a-way-to-ignore-header-lines-in-a-unix-sort sorting - Is there a way to ignore header lines in a UNIX sort? - Stack Overflow]</ref><ref>[http://linux.vbird.org/linux_basic/0320bash.php#redirect_com 命令執行的判斷依據: ; , &&, ||]</ref><ref>[https://www.computerhope.com/unix/utail.htm Linux tail command help and examples]</ref> | |||
* (1) ignore first one line: {{kbd | key=<nowiki>(head -n 1 <file> && tail -n +2 <file> | sort -us) > newfile</nowiki>}} | |||
* (2) ignore first two lines: {{kbd | key=<nowiki>(head -n 2 <file> && tail -n +3 <file> | sort -us) > newfile</nowiki>}} | |||
Relate pages: | |||
* [[Alternative_Linux_commands#Merge_multiple_plain_text_files | Merge multiple plain text files]] | |||
== Find and remove duplicates in MySQL == | == Find and remove duplicates in MySQL == | ||