Count occurrences of a word in string: Difference between revisions
m (→BASH) |
m (→Excel) |
||
| Line 2: | Line 2: | ||
== Excel == | == Excel == | ||
# Using the function [https://support.office.com/zh-tw/article/substitute-%E5%87%BD%E6%95%B8-6434944e-a904-4336-a9b0-1e58df3bc332 SUBSTITUTE] & [https://support.office.com/zh-hk/article/len%E3%80%81lenb-%E5%87%BD%E6%95%B8-29236f94-cedc-429d-affd-b5e33d2c67cb LEN] functions. [https://docs.google.com/spreadsheets/d/1eg5XuGs8X7bS_RrFECx8ktuPgTwpV12UPfkdlpbHlMU/edit?usp=sharing demo]. Or | # Using the function [https://support.office.com/zh-tw/article/substitute-%E5%87%BD%E6%95%B8-6434944e-a904-4336-a9b0-1e58df3bc332 SUBSTITUTE] & [https://support.office.com/zh-hk/article/len%E3%80%81lenb-%E5%87%BD%E6%95%B8-29236f94-cedc-429d-affd-b5e33d2c67cb LEN] functions. [https://docs.google.com/spreadsheets/d/1eg5XuGs8X7bS_RrFECx8ktuPgTwpV12UPfkdlpbHlMU/edit?usp=sharing demo]<ref>[https://errerrors.blogspot.com/2022/02/excel-calculate-text-occurrence.html Excel 計算文字出現次數]</ref>. Or | ||
# Using the function [https://support.office.com/zh-hk/article/countif-%E5%87%BD%E6%95%B8-e0de10c6-f885-4e71-abb4-1f464816df34 COUNTIF] | # Using the function [https://support.office.com/zh-hk/article/countif-%E5%87%BD%E6%95%B8-e0de10c6-f885-4e71-abb4-1f464816df34 COUNTIF] | ||
Revision as of 17:54, 4 February 2022
Counting number of occurrences (or frequency) of a word in string
Excel
- Using the function SUBSTITUTE & LEN functions. demo[1]. Or
- Using the function COUNTIF
MySQL way
SET @paragraph := 'an apple a day keeps the doctor away'; SET @term := 'apple'; SELECT FLOOR((LENGTH(@paragraph) - LENGTH(REPLACE(@paragraph, @term, ''))) / LENGTH(@term)) AS occurrences; /* same with the following query */ SELECT FLOOR((CHAR_LENGTH(@paragraph) - CHAR_LENGTH(REPLACE(@paragraph, @term, ''))) / CHAR_LENGTH(@term)) AS occurrences;
-- Count occurrences of a string: . SET @input = "www.google.com"; SET @separator = "."; SELECT (LENGTH(@input ) - LENGTH(REPLACE(@input , @separator, ""))) / LENGTH(@separator) AS count_of_separator; -- expected result: 2 -- Count occurrences of a string: og SET @input = "www.google.com"; SET @separator = "og"; SELECT (LENGTH(@input ) - LENGTH(REPLACE(@input , @separator, ""))) / LENGTH(@separator) AS count_of_separator; -- expected result: 1
PHP
Using the mb_substr_count (binary safe) or substr_count functions. See details on demo.
<?php $input = 'an apple a day keeps the doctor away'; $term = 'apple'; echo substr_count($input, $term) . PHP_EOL; $input = '一天一蘋果,醫生遠離我'; $term = '蘋果'; echo mb_substr_count($input, $term, 'UTF-8') . PHP_EOL;
BASH
data preparation
- (1) separate each string by return_symbol [2]
- (2) check the uniq command is exists on Cygwin of Win
or Linux
- (3) (optional) input the command export LC_ALL='C' on Cygwin of Win
if met the error message "Invalid or incomplete multibyte or wide character" after input the following sort command - (4) execute the following command sort <file.txt> | uniq -ic | sort -nr[3][4]
- (5) Remove the leading whitespace in the file: Using the text editor with support for regular expression and replace ^\s+(\d+)\s+ with \1\t
file: test.txt
#apple #追劇 #電影 #綜藝 #Apple #藍芽
Output format I: occurrence & keyword
The term each line in the input file was allowed contains whitespaces.
Result of the execution of command: sort test.txt | uniq -ic | sort -nr
case insensitive
2 #Apple 1 #電影 1 #追劇 1 #藍芽 1 #綜藝
Result of the execution of command: sort test.txt | uniq -c
case sensitive
1 #Apple 1 #apple 1 #綜藝 1 #藍芽 1 #追劇 1 #電影
Output format II: keyword & occurrence
The term each line in the input file should not contains whitespaces.
Result of the execution of command: sort test.txt | uniq -ic | sort -nr | awk ' { t = $1; $1 = $2; $2 = t; print; } ' [5]
case insensitive
#Apple 2 #電影 1 #追劇 1 #藍芽 1 #綜藝 1
Result of the execution of command: sort test.txt | uniq -c | sort -nr | awk ' { t = $1; $1 = $2; $2 = t; print; } '
case sensitive
#電影 1 #追劇 1 #藍芽 1 #綜藝 1 #apple 1 #Apple 1
Verification of count occurrence
cat test.txt | grep -i "#apple$" | wc -l # or cat test.txt | grep -iw "#apple" | wc -l
Options[6]
- -i means Ignore uppercase vs. lowercase.
- -w means --word-regexp
References
- mysql - Count the number of occurences of a string in a VARCHAR field? - Stack Overflow [Last visited: 2016-09-21]
- ↑ Excel 計算文字出現次數
- ↑ replacing comma's with newlines using sed
- ↑ text processing - Counting the occurrences of the string - Unix & Linux Stack Exchange
- ↑ Sort and count number of occurrence of lines - Unix & Linux Stack Exchange
- ↑ Swap two columns - awk, sed, python, perl - Stack Overflow
- ↑ Grep - Wikibooks, open books for an open world