Count number of characters: Difference between revisions

From LemonWiki共筆
Jump to navigation Jump to search
mNo edit summary
 
(7 intermediate revisions by the same user not shown)
Line 39: Line 39:
</table>
</table>


== PHP ==
== How to count characters with PHP ==
* PHP: [https://www.php.net/manual/en/function.strlen.php strlen] & [http://php.net/mb_strlen PHP mb_strlen function]
* PHP: [https://www.php.net/manual/en/function.strlen.php strlen] & [http://php.net/mb_strlen PHP mb_strlen function]
Number of characters
<pre>
<pre>
// number of characters
echo mb_strlen("狐", 'UTF-8') . PHP_EOL; // return 1
echo mb_strlen("狐", 'UTF-8') . PHP_EOL; // return 1
echo mb_strlen("《王大文 Dawen》", 'UTF-8') . PHP_EOL; // return 11
echo mb_strlen("《王大文 Dawen》", 'UTF-8') . PHP_EOL; // return 11
</pre>


// string length (number of bytes)
String length (number of bytes)
<pre>
echo strlen("狐") . PHP_EOL; // return 3
echo strlen("狐") . PHP_EOL; // return 3
echo strlen("《王大文 Dawen》") . PHP_EOL; // return 21
echo strlen("《王大文 Dawen》") . PHP_EOL; // return 21
</pre>
</pre>


== MySQL ==
Number of words {{exclaim}} [https://www.php.net/manual/en/function.str-word-count.php str_word_count] function not support Chinese characters
<pre>
echo str_word_count("The quick brown fox jumps over the lazy dog"); // return 9
echo str_word_count("敏捷的棕毛狐狸從懶狗身上躍過"); // return 0
</pre>
 
== How to count characters with MySQL ==
* MySQL: [http://www.w3resource.com/mysql/string-functions/mysql-char_length-function.php MySQL CHAR_LENGTH() function]
* MySQL: [http://www.w3resource.com/mysql/string-functions/mysql-char_length-function.php MySQL CHAR_LENGTH() function]
<PRE>
<PRE>
Line 65: Line 74:




== SQLite ==
== How to count characters with SQLite ==
[https://www.sqlitetutorial.net/sqlite-functions/sqlite-length/#targetText=SQLite%20Length,returns%20the%20number%20of%20bytes. Length] function
[https://www.sqlitetutorial.net/sqlite-functions/sqlite-length/#targetText=SQLite%20Length,returns%20the%20number%20of%20bytes. Length] function
<PRE>
<PRE>
Line 72: Line 81:
</PRE>
</PRE>


== Excel ==
== How to count characters with Excel ==
* Excel: [https://support.office.com/en-us/article/len-lenb-functions-29236f94-cedc-429d-affd-b5e33d2c67cb LEN, LENB functions] / [https://support.office.com/zh-tw/article/LEN%E3%80%81LENB-%E5%87%BD%E6%95%B8-29236f94-cedc-429d-affd-b5e33d2c67cb LEN、LENB 函數] {{exclaim}} Result of the function {{kbd | key=LENB}} is not the same with the result in other programming language.
* Excel: [https://support.office.com/en-us/article/len-lenb-functions-29236f94-cedc-429d-affd-b5e33d2c67cb LEN, LENB functions] / [https://support.office.com/zh-tw/article/LEN%E3%80%81LENB-%E5%87%BD%E6%95%B8-29236f94-cedc-429d-affd-b5e33d2c67cb LEN、LENB 函數] {{exclaim}} Result of the function {{kbd | key=LENB}} is not the same with the result in other programming language.
<pre>
<pre>
Line 87: Line 96:
* [https://stackoverflow.com/questions/5290182/how-many-bytes-does-one-unicode-character-take string - How many bytes does one Unicode character take? - Stack Overflow]
* [https://stackoverflow.com/questions/5290182/how-many-bytes-does-one-unicode-character-take string - How many bytes does one Unicode character take? - Stack Overflow]


== BASH ==
== How to count characters with BASH ==
Step1: Using [https://www.computerhope.com/unix/uwc.htm Linux wc command]
Step1: Using [https://www.computerhope.com/unix/uwc.htm Linux wc command]
<pre>
# Count the total number of characters in a file named "input.txt", while ignoring all whitespace characters (including spaces, tabs, newlines, etc.).
tr -d '\r\n[:space:]' < input.txt | wc -m
</pre>
<pre>
<pre>
# print the character counts of txt files (contains the count of return symbol)
# print the character counts of txt files (contains the count of return symbol)
Line 107: Line 122:
Number of characters (not contains the [[Return symbol | return symbol]]) = result of {{kbd | key=<nowiki>wc -m *.txt</nowiki>}} - result of {{kbd | key=<nowiki>wc -m *.txt</nowiki>}} * 2 - 1 (the last blank line costs 1 character) - number of the whitespaces
Number of characters (not contains the [[Return symbol | return symbol]]) = result of {{kbd | key=<nowiki>wc -m *.txt</nowiki>}} - result of {{kbd | key=<nowiki>wc -m *.txt</nowiki>}} * 2 - 1 (the last blank line costs 1 character) - number of the whitespaces


[[Category:Software]] [[Category:Programming]] [[Category:Data Science]] [[Category:Text file processing]] [[Category:Data transformation]]
== How to count characters with Python ==
[[Category:Regular expression]] [[Category:PHP]] [[Category:MySQL]]
Using the [https://docs.python.org/3/library/functions.html#len len()] function<ref>[https://stackoverflow.com/questions/30686701/python-get-size-of-string-in-bytes Python : Get size of string in bytes - Stack Overflow]</ref>. Try it on [https://replit.com/@planetoid/lenth-of-string#main.py replit].
 
=== Get the number of characters in a string in Python  ===
 
<pre>
string = "狐"
print(len(string))
// returns 1
</pre>
 
=== Get the number of bytes in a string in Python  ===
 
<pre>
string = "狐"
print(len(string.encode('utf-8')))
// returns 3
 
print(len(string.encode('utf-16-le')))
// returns 2
</pre>
 
 
== How to count characters with JavaScript ==
Using the [https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/length length()] function and [https://developer.mozilla.org/en-US/docs/Web/API/Blob Blob] object <ref> [https://stackoverflow.com/questions/2219526/how-many-bytes-in-a-javascript-string How many bytes in a JavaScript string? - Stack Overflow]</ref>.
 
=== Get the number of characters in a string in JavaScript  ===
 
<pre>
var string = "狐"
console.log(string.length);
</pre>
 
=== Get the number of bytes in a string in JavaScript  ===
 
<pre>
var string = "狐"
console.log(new Blob([string]).size);
// returns 3
</pre>
 
== References ==
<references />
 
== Further reading ==
 
* [https://www.ithome.com.tw/voice/131688 Unicode與JavaScript字串 | iThome]
 
[[Category:Software]] [[Category:Programming]] [[Category:Data Science]] [[Category:String manipulation]] [[Category:PHP]] [[Category:MySQL]]

Latest revision as of 18:55, 1 October 2024

Counting number of characters (or bytes) in different approaches

String example Number of characters Number of bytes
fox 3 3
The quick brown fox jumps over the lazy dog 43 43
1 3
1 3
🐘 1 4
敏捷的棕毛狐狸從懶狗身上躍過 14 28

How to count characters with PHP[edit]

Number of characters

echo mb_strlen("狐", 'UTF-8') . PHP_EOL; // return 1
echo mb_strlen("《王大文 Dawen》", 'UTF-8') . PHP_EOL; // return 11

String length (number of bytes)

echo strlen("狐") . PHP_EOL; // return 3
echo strlen("《王大文 Dawen》") . PHP_EOL; // return 21

Number of words Icon_exclaim.gif str_word_count function not support Chinese characters

echo str_word_count("The quick brown fox jumps over the lazy dog"); // return 9
echo str_word_count("敏捷的棕毛狐狸從懶狗身上躍過"); // return 0

How to count characters with MySQL[edit]

// number of characters
SELECT CHAR_LENGTH("狐"); /* return 1 */
SELECT CHAR_LENGTH("《王大文 Dawen》"); /* return 11 */

// number of bytes
SELECT LENGTH("狐"); /* return 3 */
SELECT LENGTH("《王大文 Dawen》"); /* return 21 */


How to count characters with SQLite[edit]

Length function

SELECT LENGTH("狐"); /* return 1 */
SELECT LENGTH("《王大文 Dawen》"); /* return 11 */

How to count characters with Excel[edit]

// number of characters
=LEN("狐") // return 1
=LEN("《王大文 Dawen》") // return 11

// number of bytes
=LENB("狐") // return 2
=LENB("《王大文 Dawen》") // return 16

How to count characters with BASH[edit]

Step1: Using Linux wc command

# Count the total number of characters in a file named "input.txt", while ignoring all whitespace characters (including spaces, tabs, newlines, etc.).
tr -d '\r\n[:space:]' < input.txt | wc -m


# print the character counts of txt files (contains the count of return symbol)
wc -m *.txt

# print the newline counts of txt files
wc -l *.txt

# print the whitespaces counts of txt files
grep -c ' ' *.txt

Step2: Check the Return symbol

  • e.g. \r\n costs 2 characters

Step3: final formula

Number of characters (not contains the return symbol) = result of wc -m *.txt - result of wc -m *.txt * 2 - 1 (the last blank line costs 1 character) - number of the whitespaces

How to count characters with Python[edit]

Using the len() function[1]. Try it on replit.

Get the number of characters in a string in Python[edit]

string = "狐"
print(len(string))
// returns 1

Get the number of bytes in a string in Python[edit]

string = "狐"
print(len(string.encode('utf-8')))
// returns 3

print(len(string.encode('utf-16-le')))
// returns 2


How to count characters with JavaScript[edit]

Using the length() function and Blob object [2].

Get the number of characters in a string in JavaScript[edit]

var string = "狐"
console.log(string.length);

Get the number of bytes in a string in JavaScript[edit]

var string = "狐"
console.log(new Blob([string]).size);
// returns 3

References[edit]

Further reading[edit]