Count occurrences of a word in string: Difference between revisions

Jump to navigation Jump to search
Line 59: Line 59:
* (4) execute the following command {{kbd | key=<nowiki>sort <file.txt> | uniq -ic | sort -nr</nowiki>}}<ref>[https://unix.stackexchange.com/questions/134446/counting-the-occurrences-of-the-string text processing - Counting the occurrences of the string - Unix & Linux Stack Exchange]</ref><ref>[https://unix.stackexchange.com/questions/170043/sort-and-count-number-of-occurrence-of-lines Sort and count number of occurrence of lines - Unix & Linux Stack Exchange]</ref>
* (4) execute the following command {{kbd | key=<nowiki>sort <file.txt> | uniq -ic | sort -nr</nowiki>}}<ref>[https://unix.stackexchange.com/questions/134446/counting-the-occurrences-of-the-string text processing - Counting the occurrences of the string - Unix & Linux Stack Exchange]</ref><ref>[https://unix.stackexchange.com/questions/170043/sort-and-count-number-of-occurrence-of-lines Sort and count number of occurrence of lines - Unix & Linux Stack Exchange]</ref>
* (5) Remove the leading whitespace in the file: Using the [[Text editor with support for regular expression | text editor]] with support for [[Regular expression|regular expression]] and replace {{kbd | key=<nowiki>^\s+(\d+)\s+</nowiki>}} with {{kbd | key=<nowiki>\1\t</nowiki>}}
* (5) Remove the leading whitespace in the file: Using the [[Text editor with support for regular expression | text editor]] with support for [[Regular expression|regular expression]] and replace {{kbd | key=<nowiki>^\s+(\d+)\s+</nowiki>}} with {{kbd | key=<nowiki>\1\t</nowiki>}}
=== Input Format A: One term per line ===
{{exclaim}} Each line contains only one term/keyword


file: test.txt
file: test.txt
Line 70: Line 73:
</pre>
</pre>


=== Output format I: occurrence & keyword ===
==== Output format I: count followed by keyword ====
{{exclaim}} The term each line in the input file was allowed contains whitespaces.
{{exclaim}} The term each line in the input file was allowed contains whitespaces.


Line 92: Line 95:
</pre>
</pre>


 
==== Output format II: keyword followed by count ====
=== Output format II: keyword & occurrence ===
{{exclaim}} The term each line in the input file should '''not''' contains whitespaces.
{{exclaim}} The term each line in the input file should '''not''' contains whitespaces.


Line 115: Line 117:
</pre>
</pre>


=== Input Format B: Multiple terms per line ===
{{exclaim}} Each line contains multiple terms/keywords separated by spaces
file: input.txt
<pre>
電影 追劇 綜藝
藍芽 apple 電影
電影 綜藝
</pre>
==== Method using awk for word frequency counting ====
{{kbd | key=<nowiki>awk '{for(i=1;i<=NF;i++) count[$i]++} END {for(word in count) print count[word], word}' input.txt | sort -nr</nowiki>}}
Output:
<pre>
3 電影
2 綜藝
1 追劇
1 藍芽
1 apple
</pre>
How it works:
* {{kbd | key=<nowiki>{for(i=1;i<=NF;i++) count[$i]++}</nowiki>}} - Loop through each field (word) in each line and increment its count
* {{kbd | key=<nowiki>END {for(word in count) print count[word], word}</nowiki>}} - After processing all lines, print count and word for each unique word
* {{kbd | key=<nowiki>sort -nr</nowiki>}} - Sort numerically in descending order


=== Verification of count occurrence ===
=== Verification of count occurrence ===

Navigation menu