Count number of characters: Difference between revisions
(→PHP) |
|||
| (4 intermediate revisions by the same user not shown) | |||
| Line 1: | Line 1: | ||
Counting number of characters (or bytes) in different approaches | Counting number of characters (or bytes) in different approaches | ||
== Character Count vs. Byte Count Comparison == | |||
<table border="1" class="wikitable sortable"> | <table border="1" class="wikitable sortable"> | ||
<tr> | <tr> | ||
| Line 35: | Line 36: | ||
<td>敏捷的棕毛狐狸從懶狗身上躍過</td> | <td>敏捷的棕毛狐狸從懶狗身上躍過</td> | ||
<td>14</td> | <td>14</td> | ||
<td> | <td>42</td> | ||
</tr> | </tr> | ||
</table> | </table> | ||
== PHP == | : [[Image:Owl icon.jpg]] Each Chinese character is approximately 3 bytes in UTF-8 | ||
Common Chinese characters (CJK Unified Ideographs, U+4E00 ~ U+9FFF) are all 3 bytes — for example, "你", "好", "狐", and "象". Characters beyond U+FFFF (outside the Basic Multilingual Plane, BMP<ref>[https://en.wikipedia.org/wiki/Plane_(Unicode) Plane (Unicode) - Wikipedia]</ref>) take 4 bytes. These are mostly rare characters from CJK Extension B (U+20000 and up)<ref>[https://en.wikipedia.org/wiki/CJK_Unified_Ideographs_Extension_B CJK Unified Ideographs Extension B - Wikipedia]</ref>, such as "[https://www.cns11643.gov.tw/wordView.jsp?ID=993142 𤆬]" (U+241AC) and "[https://www.cns11643.gov.tw/wordView.jsp?ID=402472 𠮷]" (U+20BB7). | |||
== How to count characters with PHP == | |||
* PHP: [https://www.php.net/manual/en/function.strlen.php strlen] & [http://php.net/mb_strlen PHP mb_strlen function] | * PHP: [https://www.php.net/manual/en/function.strlen.php strlen] & [http://php.net/mb_strlen PHP mb_strlen function] | ||
| Line 60: | Line 65: | ||
</pre> | </pre> | ||
== MySQL == | == How to count characters with MySQL == | ||
* MySQL: [http://www.w3resource.com/mysql/string-functions/mysql-char_length-function.php MySQL CHAR_LENGTH() function] | * MySQL: [http://www.w3resource.com/mysql/string-functions/mysql-char_length-function.php MySQL CHAR_LENGTH() function] | ||
<PRE> | <PRE> | ||
| Line 74: | Line 79: | ||
== SQLite == | == How to count characters with SQLite == | ||
[https://www.sqlitetutorial.net/sqlite-functions/sqlite-length/#targetText=SQLite%20Length,returns%20the%20number%20of%20bytes. Length] function | [https://www.sqlitetutorial.net/sqlite-functions/sqlite-length/#targetText=SQLite%20Length,returns%20the%20number%20of%20bytes. Length] function | ||
<PRE> | <PRE> | ||
| Line 81: | Line 86: | ||
</PRE> | </PRE> | ||
== Excel == | == How to count characters with Excel == | ||
* Excel: [https://support.office.com/en-us/article/len-lenb-functions-29236f94-cedc-429d-affd-b5e33d2c67cb LEN, LENB functions] / [https://support.office.com/zh-tw/article/LEN%E3%80%81LENB-%E5%87%BD%E6%95%B8-29236f94-cedc-429d-affd-b5e33d2c67cb LEN、LENB 函數] {{exclaim}} Result of the function {{kbd | key=LENB}} is not the same with the result in other programming language. | * Excel: [https://support.office.com/en-us/article/len-lenb-functions-29236f94-cedc-429d-affd-b5e33d2c67cb LEN, LENB functions] / [https://support.office.com/zh-tw/article/LEN%E3%80%81LENB-%E5%87%BD%E6%95%B8-29236f94-cedc-429d-affd-b5e33d2c67cb LEN、LENB 函數] {{exclaim}} Result of the function {{kbd | key=LENB}} is not the same with the result in other programming language. | ||
<pre> | <pre> | ||
| Line 96: | Line 101: | ||
* [https://stackoverflow.com/questions/5290182/how-many-bytes-does-one-unicode-character-take string - How many bytes does one Unicode character take? - Stack Overflow] | * [https://stackoverflow.com/questions/5290182/how-many-bytes-does-one-unicode-character-take string - How many bytes does one Unicode character take? - Stack Overflow] | ||
== BASH == | == How to count characters with BASH == | ||
Step1: Using [https://www.computerhope.com/unix/uwc.htm Linux wc command] | Step1: Using [https://www.computerhope.com/unix/uwc.htm Linux wc command] | ||
<pre> | |||
# Count the total number of characters in a file named "input.txt", while ignoring all whitespace characters (including spaces, tabs, newlines, etc.). | |||
tr -d '\r\n[:space:]' < input.txt | wc -m | |||
</pre> | |||
<pre> | <pre> | ||
# print the character counts of txt files (contains the count of return symbol) | # print the character counts of txt files (contains the count of return symbol) | ||
| Line 116: | Line 127: | ||
Number of characters (not contains the [[Return symbol | return symbol]]) = result of {{kbd | key=<nowiki>wc -m *.txt</nowiki>}} - result of {{kbd | key=<nowiki>wc -m *.txt</nowiki>}} * 2 - 1 (the last blank line costs 1 character) - number of the whitespaces | Number of characters (not contains the [[Return symbol | return symbol]]) = result of {{kbd | key=<nowiki>wc -m *.txt</nowiki>}} - result of {{kbd | key=<nowiki>wc -m *.txt</nowiki>}} * 2 - 1 (the last blank line costs 1 character) - number of the whitespaces | ||
== How to count characters with Python == | |||
== Python == | |||
Using the [https://docs.python.org/3/library/functions.html#len len()] function<ref>[https://stackoverflow.com/questions/30686701/python-get-size-of-string-in-bytes Python : Get size of string in bytes - Stack Overflow]</ref>. Try it on [https://replit.com/@planetoid/lenth-of-string#main.py replit]. | Using the [https://docs.python.org/3/library/functions.html#len len()] function<ref>[https://stackoverflow.com/questions/30686701/python-get-size-of-string-in-bytes Python : Get size of string in bytes - Stack Overflow]</ref>. Try it on [https://replit.com/@planetoid/lenth-of-string#main.py replit]. | ||
| Line 140: | Line 150: | ||
== JavaScript == | == How to count characters with JavaScript == | ||
Using the [https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/length length()] function and [https://developer.mozilla.org/en-US/docs/Web/API/Blob Blob] object <ref> [https://stackoverflow.com/questions/2219526/how-many-bytes-in-a-javascript-string How many bytes in a JavaScript string? - Stack Overflow]</ref>. | Using the [https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/length length()] function and [https://developer.mozilla.org/en-US/docs/Web/API/Blob Blob] object <ref> [https://stackoverflow.com/questions/2219526/how-many-bytes-in-a-javascript-string How many bytes in a JavaScript string? - Stack Overflow]</ref>. | ||
| Line 165: | Line 175: | ||
* [https://www.ithome.com.tw/voice/131688 Unicode與JavaScript字串 | iThome] | * [https://www.ithome.com.tw/voice/131688 Unicode與JavaScript字串 | iThome] | ||
[[Category:Software]] [[Category:Programming]] [[Category:Data Science]] [[Category:String manipulation]] [[Category:PHP]] [[Category:MySQL]] | [[Category: Software]] | ||
[[Category: Programming]] | |||
[[Category: Data Science]] | |||
[[Category: String manipulation]] | |||
[[Category: PHP]] | |||
[[Category: MySQL]] | |||
[[Category: Revised with LLMs]] | |||
Latest revision as of 15:28, 10 June 2026
Counting number of characters (or bytes) in different approaches
Character Count vs. Byte Count Comparison[edit]
| String example | Number of characters | Number of bytes |
|---|---|---|
| fox | 3 | 3 |
| The quick brown fox jumps over the lazy dog | 43 | 43 |
| 狐 | 1 | 3 |
| 象 | 1 | 3 |
| 🐘 | 1 | 4 |
| 敏捷的棕毛狐狸從懶狗身上躍過 | 14 | 42 |
Common Chinese characters (CJK Unified Ideographs, U+4E00 ~ U+9FFF) are all 3 bytes — for example, "你", "好", "狐", and "象". Characters beyond U+FFFF (outside the Basic Multilingual Plane, BMP[1]) take 4 bytes. These are mostly rare characters from CJK Extension B (U+20000 and up)[2], such as "𤆬" (U+241AC) and "𠮷" (U+20BB7).
How to count characters with PHP[edit]
- PHP: strlen & PHP mb_strlen function
Number of characters
echo mb_strlen("狐", 'UTF-8') . PHP_EOL; // return 1
echo mb_strlen("《王大文 Dawen》", 'UTF-8') . PHP_EOL; // return 11
String length (number of bytes)
echo strlen("狐") . PHP_EOL; // return 3
echo strlen("《王大文 Dawen》") . PHP_EOL; // return 21
Number of words
str_word_count function not support Chinese characters
echo str_word_count("The quick brown fox jumps over the lazy dog"); // return 9
echo str_word_count("敏捷的棕毛狐狸從懶狗身上躍過"); // return 0
How to count characters with MySQL[edit]
- MySQL: MySQL CHAR_LENGTH() function
// number of characters
SELECT CHAR_LENGTH("狐"); /* return 1 */
SELECT CHAR_LENGTH("《王大文 Dawen》"); /* return 11 */
// number of bytes
SELECT LENGTH("狐"); /* return 3 */
SELECT LENGTH("《王大文 Dawen》"); /* return 21 */
- MySQL :: MySQL 8.0 Reference Manual :: 11.4.1 The CHAR and VARCHAR Types e.g. VARCHAR(5) or CHAR(5) means can hold up to 5 characters.
How to count characters with SQLite[edit]
Length function
SELECT LENGTH("狐"); /* return 1 */
SELECT LENGTH("《王大文 Dawen》"); /* return 11 */
How to count characters with Excel[edit]
- Excel: LEN, LENB functions / LEN、LENB 函數
Result of the function LENB is not the same with the result in other programming language.
// number of characters
=LEN("狐") // return 1
=LEN("《王大文 Dawen》") // return 11
// number of bytes
=LENB("狐") // return 2
=LENB("《王大文 Dawen》") // return 16
- Calculate String Length Online
- string - How many bytes does one Unicode character take? - Stack Overflow
How to count characters with BASH[edit]
Step1: Using Linux wc command
# Count the total number of characters in a file named "input.txt", while ignoring all whitespace characters (including spaces, tabs, newlines, etc.). tr -d '\r\n[:space:]' < input.txt | wc -m
# print the character counts of txt files (contains the count of return symbol) wc -m *.txt # print the newline counts of txt files wc -l *.txt # print the whitespaces counts of txt files grep -c ' ' *.txt
Step2: Check the Return symbol
- e.g. \r\n costs 2 characters
Step3: final formula
Number of characters (not contains the return symbol) = result of wc -m *.txt - result of wc -m *.txt * 2 - 1 (the last blank line costs 1 character) - number of the whitespaces
How to count characters with Python[edit]
Using the len() function[3]. Try it on replit.
Get the number of characters in a string in Python[edit]
string = "狐" print(len(string)) // returns 1
Get the number of bytes in a string in Python[edit]
string = "狐"
print(len(string.encode('utf-8')))
// returns 3
print(len(string.encode('utf-16-le')))
// returns 2
How to count characters with JavaScript[edit]
Using the length() function and Blob object [4].
Get the number of characters in a string in JavaScript[edit]
var string = "狐" console.log(string.length);
Get the number of bytes in a string in JavaScript[edit]
var string = "狐" console.log(new Blob([string]).size); // returns 3