Difference between revisions of "Count number of characters"

From LemonWiki共筆
Jump to: navigation, search
(Created page with "Count number of characters in different approaches == BASH == Step1: Using [https://www.computerhope.com/unix/uwc.htm Linux wc command] <pre> # print the character counts of...")
 
m
 
(6 intermediate revisions by the same user not shown)
Line 1: Line 1:
Count number of characters in different approaches
+
Counting number of characters in different approaches
 +
 
 +
<table border="1" class="wikitable sortable">
 +
<tr>
 +
<th>String example</th>
 +
<th>Number of characters</th>
 +
<th>Number of bytes</th>
 +
</tr>
 +
<tr>
 +
<td>fox</td>
 +
<td>3</td>
 +
<td>3</td>
 +
</tr>
 +
<tr>
 +
<td>The quick brown fox jumps over the lazy dog</td>
 +
<td>43</td>
 +
<td>43</td>
 +
</tr>
 +
<tr>
 +
<td>狐</td>
 +
<td>1</td>
 +
<td>3</td>
 +
</tr>
 +
<tr>
 +
<td>象</td>
 +
<td>1</td>
 +
<td>3</td>
 +
</tr>
 +
<tr>
 +
<td>🐘</td>
 +
<td>1</td>
 +
<td>4</td>
 +
</tr>
 +
<tr>
 +
<td>敏捷的棕毛狐狸從懶狗身上躍過</td>
 +
<td>14</td>
 +
<td>28</td>
 +
</tr>
 +
</table>
 +
 
 +
== PHP ==
 +
* PHP: [https://www.php.net/manual/en/function.strlen.php strlen] & [http://php.net/mb_strlen PHP mb_strlen function]
 +
<pre>
 +
// number of characters
 +
echo mb_strlen("狐", 'UTF-8') . PHP_EOL; // return 1
 +
echo mb_strlen("《王大文 Dawen》", 'UTF-8') . PHP_EOL; // return 11
 +
 
 +
// string length (number of bytes)
 +
echo strlen("狐") . PHP_EOL; // return 3
 +
echo strlen("《王大文 Dawen》") . PHP_EOL; // return 21
 +
</pre>
 +
 
 +
== MySQL ==
 +
* MySQL: [http://www.w3resource.com/mysql/string-functions/mysql-char_length-function.php MySQL CHAR_LENGTH() function]
 +
<PRE>
 +
// number of characters
 +
SELECT CHAR_LENGTH("狐"); /* return 1 */
 +
SELECT CHAR_LENGTH("《王大文 Dawen》"); /* return 11 */
 +
 
 +
// number of bytes
 +
SELECT LENGTH("狐"); /* return 3 */
 +
SELECT LENGTH("《王大文 Dawen》"); /* return 21 */
 +
</PRE>
 +
* [https://dev.mysql.com/doc/refman/8.0/en/char.html MySQL :: MySQL 8.0 Reference Manual :: 11.4.1 The CHAR and VARCHAR Types] e.g. {{kbd | key=<nowiki>VARCHAR(5)</nowiki>}} or {{kbd | key=<nowiki>CHAR(5)</nowiki>}} means can hold up to 5 characters.
 +
 
 +
 
 +
== SQLite ==
 +
[https://www.sqlitetutorial.net/sqlite-functions/sqlite-length/#targetText=SQLite%20Length,returns%20the%20number%20of%20bytes. Length] function
 +
<PRE>
 +
SELECT LENGTH("狐"); /* return 1 */
 +
SELECT LENGTH("《王大文 Dawen》"); /* return 11 */
 +
</PRE>
 +
 
 +
== Excel ==
 +
* Excel: [https://support.office.com/en-us/article/len-lenb-functions-29236f94-cedc-429d-affd-b5e33d2c67cb LEN, LENB functions] / [https://support.office.com/zh-tw/article/LEN%E3%80%81LENB-%E5%87%BD%E6%95%B8-29236f94-cedc-429d-affd-b5e33d2c67cb LEN、LENB 函數] {{exclaim}} Result of the function {{kbd | key=LENB}} is not the same with the result in other programming language.
 +
<pre>
 +
// number of characters
 +
=LEN("狐") // return 1
 +
=LEN("《王大文 Dawen》") // return 11
 +
 
 +
// number of bytes
 +
=LENB("狐") // return 2
 +
=LENB("《王大文 Dawen》") // return 16
 +
</pre>
 +
 
 +
* [http://string-functions.com/length.aspx Calculate String Length Online]
 +
* [https://stackoverflow.com/questions/5290182/how-many-bytes-does-one-unicode-character-take string - How many bytes does one Unicode character take? - Stack Overflow]
  
 
== BASH ==
 
== BASH ==
 
Step1: Using [https://www.computerhope.com/unix/uwc.htm Linux wc command]
 
Step1: Using [https://www.computerhope.com/unix/uwc.htm Linux wc command]
 
<pre>
 
<pre>
# print the character counts of txt files (include the count of new line)
+
# print the character counts of txt files (contains the count of return symbol)
 
wc -m *.txt
 
wc -m *.txt
  
 
# print the newline counts of txt files
 
# print the newline counts of txt files
 
wc -l *.txt
 
wc -l *.txt
 +
 +
# print the whitespaces counts of txt files
 +
grep -c ' ' *.txt
 
</pre>
 
</pre>
  
Line 14: Line 103:
 
* e.g. {{kbd | key=<nowiki>\r\n</nowiki>}} costs 2 characters
 
* e.g. {{kbd | key=<nowiki>\r\n</nowiki>}} costs 2 characters
  
Step3:
+
Step3: final formula
number of characters = result of {{kbd | key=<nowiki>wc -m *.txt</nowiki>}} - result of {{kbd | key=<nowiki>wc -m *.txt</nowiki>}} * 2 - 1 (the last blank line costs 1 character)
+
 
 +
Number of characters (not contains the [[Return symbol | return symbol]]) = result of {{kbd | key=<nowiki>wc -m *.txt</nowiki>}} - result of {{kbd | key=<nowiki>wc -m *.txt</nowiki>}} * 2 - 1 (the last blank line costs 1 character) - number of the whitespaces
  
 
[[Category:Software]] [[Category:Programming]] [[Category:Data Science]] [[Category:Text file processing]] [[Category:Data transformation]]
 
[[Category:Software]] [[Category:Programming]] [[Category:Data Science]] [[Category:Text file processing]] [[Category:Data transformation]]
 +
[[Category:Regular expression]] [[Category:PHP]] [[Category:MySQL]]

Latest revision as of 19:03, 6 November 2019

Counting number of characters in different approaches

String example Number of characters Number of bytes
fox 3 3
The quick brown fox jumps over the lazy dog 43 43
1 3
1 3
🐘 1 4
敏捷的棕毛狐狸從懶狗身上躍過 14 28

PHP[edit]

// number of characters
echo mb_strlen("狐", 'UTF-8') . PHP_EOL; // return 1
echo mb_strlen("《王大文 Dawen》", 'UTF-8') . PHP_EOL; // return 11

// string length (number of bytes)
echo strlen("狐") . PHP_EOL; // return 3
echo strlen("《王大文 Dawen》") . PHP_EOL; // return 21

MySQL[edit]

// number of characters
SELECT CHAR_LENGTH("狐"); /* return 1 */
SELECT CHAR_LENGTH("《王大文 Dawen》"); /* return 11 */

// number of bytes
SELECT LENGTH("狐"); /* return 3 */
SELECT LENGTH("《王大文 Dawen》"); /* return 21 */


SQLite[edit]

Length function

SELECT LENGTH("狐"); /* return 1 */
SELECT LENGTH("《王大文 Dawen》"); /* return 11 */

Excel[edit]

// number of characters
=LEN("狐") // return 1
=LEN("《王大文 Dawen》") // return 11

// number of bytes
=LENB("狐") // return 2
=LENB("《王大文 Dawen》") // return 16

BASH[edit]

Step1: Using Linux wc command

# print the character counts of txt files (contains the count of return symbol)
wc -m *.txt

# print the newline counts of txt files
wc -l *.txt

# print the whitespaces counts of txt files
grep -c ' ' *.txt

Step2: Check the Return symbol

  • e.g. \r\n costs 2 characters

Step3: final formula

Number of characters (not contains the return symbol) = result of wc -m *.txt - result of wc -m *.txt * 2 - 1 (the last blank line costs 1 character) - number of the whitespaces