14,953
edits
m (→References) |
(→BASH) |
||
| Line 59: | Line 59: | ||
* (4) execute the following command {{kbd | key=<nowiki>sort <file.txt> | uniq -ic | sort -nr</nowiki>}}<ref>[https://unix.stackexchange.com/questions/134446/counting-the-occurrences-of-the-string text processing - Counting the occurrences of the string - Unix & Linux Stack Exchange]</ref><ref>[https://unix.stackexchange.com/questions/170043/sort-and-count-number-of-occurrence-of-lines Sort and count number of occurrence of lines - Unix & Linux Stack Exchange]</ref> | * (4) execute the following command {{kbd | key=<nowiki>sort <file.txt> | uniq -ic | sort -nr</nowiki>}}<ref>[https://unix.stackexchange.com/questions/134446/counting-the-occurrences-of-the-string text processing - Counting the occurrences of the string - Unix & Linux Stack Exchange]</ref><ref>[https://unix.stackexchange.com/questions/170043/sort-and-count-number-of-occurrence-of-lines Sort and count number of occurrence of lines - Unix & Linux Stack Exchange]</ref> | ||
* (5) Remove the leading whitespace in the file: Using the [[Text editor with support for regular expression | text editor]] with support for [[Regular expression|regular expression]] and replace {{kbd | key=<nowiki>^\s+(\d+)\s+</nowiki>}} with {{kbd | key=<nowiki>\1\t</nowiki>}} | * (5) Remove the leading whitespace in the file: Using the [[Text editor with support for regular expression | text editor]] with support for [[Regular expression|regular expression]] and replace {{kbd | key=<nowiki>^\s+(\d+)\s+</nowiki>}} with {{kbd | key=<nowiki>\1\t</nowiki>}} | ||
=== Input Format A: One term per line === | |||
{{exclaim}} Each line contains only one term/keyword | |||
file: test.txt | file: test.txt | ||
| Line 70: | Line 73: | ||
</pre> | </pre> | ||
=== Output format I: | ==== Output format I: count followed by keyword ==== | ||
{{exclaim}} The term each line in the input file was allowed contains whitespaces. | {{exclaim}} The term each line in the input file was allowed contains whitespaces. | ||
| Line 92: | Line 95: | ||
</pre> | </pre> | ||
==== Output format II: keyword followed by count ==== | |||
=== Output format II: keyword | |||
{{exclaim}} The term each line in the input file should '''not''' contains whitespaces. | {{exclaim}} The term each line in the input file should '''not''' contains whitespaces. | ||
| Line 115: | Line 117: | ||
</pre> | </pre> | ||
=== Input Format B: Multiple terms per line === | |||
{{exclaim}} Each line contains multiple terms/keywords separated by spaces | |||
file: input.txt | |||
<pre> | |||
電影 追劇 綜藝 | |||
藍芽 apple 電影 | |||
電影 綜藝 | |||
</pre> | |||
==== Method using awk for word frequency counting ==== | |||
{{kbd | key=<nowiki>awk '{for(i=1;i<=NF;i++) count[$i]++} END {for(word in count) print count[word], word}' input.txt | sort -nr</nowiki>}} | |||
Output: | |||
<pre> | |||
3 電影 | |||
2 綜藝 | |||
1 追劇 | |||
1 藍芽 | |||
1 apple | |||
</pre> | |||
How it works: | |||
* {{kbd | key=<nowiki>{for(i=1;i<=NF;i++) count[$i]++}</nowiki>}} - Loop through each field (word) in each line and increment its count | |||
* {{kbd | key=<nowiki>END {for(word in count) print count[word], word}</nowiki>}} - After processing all lines, print count and word for each unique word | |||
* {{kbd | key=<nowiki>sort -nr</nowiki>}} - Sort numerically in descending order | |||
=== Verification of count occurrence === | === Verification of count occurrence === | ||