Count occurrences of a word in string: Difference between revisions

From LemonWiki共筆
Jump to navigation Jump to search
(3 intermediate revisions by the same user not shown)
Line 43: Line 43:
== BASH ==
== BASH ==
data preparation
data preparation
* (1) separate each string by [[Return symbol | return_symbol]]  
* (1) separate each string by [[Return symbol | return_symbol]] <ref>[https://www.unix.com/shell-programming-and-scripting/83076-replacing-commas-newlines-using-sed.html replacing comma's with newlines using sed]</ref>
* (2) check the [https://www.computerhope.com/unix/uuniq.htm uniq command] is exists on Cygwin of {{Win}} or {{Linux}}
* (2) check the [https://www.computerhope.com/unix/uuniq.htm uniq command] is exists on Cygwin of {{Win}} or {{Linux}}
* (3) execute the following command {{kbd | key=<nowiki>sort <file.txt> | uniq -ic | sort -nr</nowiki>}}<ref>[https://unix.stackexchange.com/questions/134446/counting-the-occurrences-of-the-string text processing - Counting the occurrences of the string - Unix & Linux Stack Exchange]</ref>
* (3) execute the following command {{kbd | key=<nowiki>sort <file.txt> | uniq -ic | sort -nr</nowiki>}}<ref>[https://unix.stackexchange.com/questions/134446/counting-the-occurrences-of-the-string text processing - Counting the occurrences of the string - Unix & Linux Stack Exchange]</ref><ref>[https://unix.stackexchange.com/questions/170043/sort-and-count-number-of-occurrence-of-lines Sort and count number of occurrence of lines - Unix & Linux Stack Exchange]</ref>


file: test.txt
file: test.txt
Line 57: Line 57:
</pre>
</pre>


Result of the execution of command: {{kbd | key=<nowiki>sort test.txt | uniq -ic | sort -nr</nowiki>}}
Result of the execution of command: {{kbd | key=<nowiki>sort test.txt | uniq -ic | sort -nr</nowiki>}} {{exclaim}} insensitive
<pre>
<pre>
   2 #Apple
   2 #Apple
Line 66: Line 66:
</pre>
</pre>


Result of the execution of command: {{kbd | key=<nowiki>sort test.txt | uniq -c</nowiki>}}
Result of the execution of command: {{kbd | key=<nowiki>sort test.txt | uniq -c</nowiki>}} {{exclaim}} sensitive
<pre>
<pre>
   1 #Apple
   1 #Apple
Line 75: Line 75:
   1 #電影
   1 #電影
</pre>
</pre>
verification of count occurrence
<pre>
cat test.txt | grep -i "#apple$" | wc -l
# or
cat test.txt | grep -iw "#apple" | wc -l
</pre>
Options<ref>[https://en.wikibooks.org/wiki/Grep Grep - Wikibooks, open books for an open world]</ref>
* {{kbd | key=<nowiki>-i</nowiki>}} means {{kbd | key=<nowiki>Ignore uppercase vs. lowercase.</nowiki>}}
* {{kbd | key=<nowiki>-w</nowiki>}} means {{kbd | key=<nowiki>--word-regexp</nowiki>}}


== References ==
== References ==

Revision as of 12:16, 19 November 2019

Counting number of occurrences (or frequency) of a word in string

Excel

  1. Using the function SUBSTITUTE & LEN functions. demo. Or
  2. Using the function COUNTIF

MySQL way

SET @paragraph := 'an apple a day keeps the doctor away';
SET @term := 'apple';

SELECT FLOOR((LENGTH(@paragraph) - LENGTH(REPLACE(@paragraph, @term, ''))) / LENGTH(@term)) AS occurrences;

/* same with the following query */
SELECT FLOOR((CHAR_LENGTH(@paragraph) - CHAR_LENGTH(REPLACE(@paragraph, @term, ''))) / CHAR_LENGTH(@term)) AS occurrences;

online example

-- Count occurrences of a string: .
SET @input = "www.google.com";
SET @separator = ".";
SELECT (LENGTH(@input ) - LENGTH(REPLACE(@input , @separator, ""))) / LENGTH(@separator) AS count_of_separator;
-- expected result: 2

-- Count occurrences of a string: og
SET @input = "www.google.com";
SET @separator = "og";
SELECT (LENGTH(@input ) - LENGTH(REPLACE(@input , @separator, ""))) / LENGTH(@separator) AS count_of_separator;
-- expected result: 1


PHP

Using the mb_substr_count (binary safe) or substr_count functions. See details on demo.

$input = 'an apple a day keeps the doctor away';
$term = 'apple';

echo substr_count($input, $term);

BASH

data preparation

  • (1) separate each string by return_symbol [1]
  • (2) check the uniq command is exists on Cygwin of Win Os windows.png or Linux Os linux.png
  • (3) execute the following command sort <file.txt> | uniq -ic | sort -nr[2][3]

file: test.txt

#apple
#追劇
#電影
#綜藝
#Apple
#藍芽

Result of the execution of command: sort test.txt | uniq -ic | sort -nr Icon_exclaim.gif insensitive

   2 #Apple
   1 #電影
   1 #追劇
   1 #藍芽
   1 #綜藝

Result of the execution of command: sort test.txt | uniq -c Icon_exclaim.gif sensitive

   1 #Apple
   1 #apple
   1 #綜藝
   1 #藍芽
   1 #追劇
   1 #電影

verification of count occurrence

cat test.txt | grep -i "#apple$" | wc -l

# or
cat test.txt | grep -iw "#apple" | wc -l

Options[4]

  • -i means Ignore uppercase vs. lowercase.
  • -w means --word-regexp

References