Count number of characters: Difference between revisions
Jump to navigation
Jump to search
m (Text replacement - "Category:Text file processing" to "Category:String manipulation") |
(python) |
||
| Line 106: | Line 106: | ||
Number of characters (not contains the [[Return symbol | return symbol]]) = result of {{kbd | key=<nowiki>wc -m *.txt</nowiki>}} - result of {{kbd | key=<nowiki>wc -m *.txt</nowiki>}} * 2 - 1 (the last blank line costs 1 character) - number of the whitespaces | Number of characters (not contains the [[Return symbol | return symbol]]) = result of {{kbd | key=<nowiki>wc -m *.txt</nowiki>}} - result of {{kbd | key=<nowiki>wc -m *.txt</nowiki>}} * 2 - 1 (the last blank line costs 1 character) - number of the whitespaces | ||
== Python == | |||
Using the [https://docs.python.org/3/library/functions.html#len len()] function<ref>[https://stackoverflow.com/questions/30686701/python-get-size-of-string-in-bytes Python : Get size of string in bytes - Stack Overflow]</ref>. Try it on [https://replit.com/@planetoid/lenth-of-string#main.py replit]. | |||
=== Get the number of characters in a string in Python === | |||
<pre> | |||
string = "狐" | |||
print(len(string)) | |||
// returns 1 | |||
</pre> | |||
=== Get the number of bytes in a string in Python === | |||
<pre> | |||
string = "狐" | |||
print(len(string.encode('utf-8'))) | |||
// returns 3 | |||
print(len(string.encode('utf-16-le'))) | |||
// returns 2 | |||
</pre> | |||
== References == | |||
<references /> | |||
[[Category:Software]] [[Category:Programming]] [[Category:Data Science]] [[Category:String manipulation]] [[Category:Data transformation]] | [[Category:Software]] [[Category:Programming]] [[Category:Data Science]] [[Category:String manipulation]] [[Category:Data transformation]] | ||
[[Category:Regular expression]] [[Category:PHP]] [[Category:MySQL]] | [[Category:Regular expression]] [[Category:PHP]] [[Category:MySQL]] | ||
Revision as of 09:25, 2 May 2022
Counting number of characters (or bytes) in different approaches
| String example | Number of characters | Number of bytes |
|---|---|---|
| fox | 3 | 3 |
| The quick brown fox jumps over the lazy dog | 43 | 43 |
| 狐 | 1 | 3 |
| 象 | 1 | 3 |
| 🐘 | 1 | 4 |
| 敏捷的棕毛狐狸從懶狗身上躍過 | 14 | 28 |
PHP
- PHP: strlen & PHP mb_strlen function
// number of characters
echo mb_strlen("狐", 'UTF-8') . PHP_EOL; // return 1
echo mb_strlen("《王大文 Dawen》", 'UTF-8') . PHP_EOL; // return 11
// string length (number of bytes)
echo strlen("狐") . PHP_EOL; // return 3
echo strlen("《王大文 Dawen》") . PHP_EOL; // return 21
MySQL
- MySQL: MySQL CHAR_LENGTH() function
// number of characters
SELECT CHAR_LENGTH("狐"); /* return 1 */
SELECT CHAR_LENGTH("《王大文 Dawen》"); /* return 11 */
// number of bytes
SELECT LENGTH("狐"); /* return 3 */
SELECT LENGTH("《王大文 Dawen》"); /* return 21 */
- MySQL :: MySQL 8.0 Reference Manual :: 11.4.1 The CHAR and VARCHAR Types e.g. VARCHAR(5) or CHAR(5) means can hold up to 5 characters.
SQLite
Length function
SELECT LENGTH("狐"); /* return 1 */
SELECT LENGTH("《王大文 Dawen》"); /* return 11 */
Excel
- Excel: LEN, LENB functions / LEN、LENB 函數
Result of the function LENB is not the same with the result in other programming language.
// number of characters
=LEN("狐") // return 1
=LEN("《王大文 Dawen》") // return 11
// number of bytes
=LENB("狐") // return 2
=LENB("《王大文 Dawen》") // return 16
- Calculate String Length Online
- string - How many bytes does one Unicode character take? - Stack Overflow
BASH
Step1: Using Linux wc command
# print the character counts of txt files (contains the count of return symbol) wc -m *.txt # print the newline counts of txt files wc -l *.txt # print the whitespaces counts of txt files grep -c ' ' *.txt
Step2: Check the Return symbol
- e.g. \r\n costs 2 characters
Step3: final formula
Number of characters (not contains the return symbol) = result of wc -m *.txt - result of wc -m *.txt * 2 - 1 (the last blank line costs 1 character) - number of the whitespaces
Python
Using the len() function[1]. Try it on replit.
Get the number of characters in a string in Python
string = "狐" print(len(string)) // returns 1
Get the number of bytes in a string in Python
string = "狐"
print(len(string.encode('utf-8')))
// returns 3
print(len(string.encode('utf-16-le')))
// returns 2