Editing
Fix garbled message text
(section)
Jump to navigation
Jump to search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== How to fix garbled message text == Ideas on how to fix garbled message text # Possible cause #* Encoding issue: Choose the correct the language/encode of message text or auto detect the encode by tools #* PHP [http://php.net/manual/en/function.utf8-encode.php utf8_encode()] & [http://php.net/manual/en/function.utf8-decode.php utf8_decode()] # (optional) convert the current encode to UTF-8 # (optional) Making text wrap to window size List of the (look like but not) garbled text and possible root cause <table border="1" style="width: 100%; table-layout: fixed;" class="wikitable sortable"> <tr> <th>Feature</th> <th>Example</th> <th>Meaning</th> <th>Restore to human readable ↔ encode text</th> </tr> <tr> <td>String contains {{kbd | key=<nowiki>%2</nowiki>}} or {{kbd | key=<nowiki>%20</nowiki>}} symbols and meaningfulness English characters</td> <td style="word-wrap: break-word;">{{kbd | key=<nowiki>http%3A%2F%2Fwww.%E4%B8%AD%E6%96%87%E7%B6%B2%E5%9D%80.tw%2F</nowiki>}}</td> <td>"converts characters into a format that can be transmitted over the Internet ... " Cited from [http://www.w3schools.com/tags/ref_urlencode.asp w3schools]</td> <td>URL decode ↔ URL eocode</td> </tr> <tr> <td>String start from {{kbd | key=<nowiki>\u</nowiki>}}, {{kbd | key=<nowiki>\U</nowiki>}} or {{kbd | key=<nowiki>U+</nowiki>}} symbols</td> <td style="word-wrap: break-word;">{{kbd | key=<nowiki>\u8c61</nowiki>}}, {{kbd | key=<nowiki>\U0001f418</nowiki>}} or {{kbd | key=<nowiki>U+1F418</nowiki>}}</td> <td>Unicode number: "Unicode code point is referred to by writing "U+" followed by its hexadecimal number.<ref>[https://en.wikipedia.org/wiki/Unicode Unicode - Wikipedia]</ref>" (1) 16-bit or 32-bit hex value (2) "JSON representation of the supplied value"<ref>[http://php.net/manual/en/function.json-encode.php PHP: json_encode - Manual]</ref><ref>[http://www.faqs.org/rfcs/rfc7159.html RFC 7159: The JavaScript Object Notation (JSON) Data Interchange Format]</ref></td> <td>JSON decode ↔ JSON eocode</td> </tr> <tr> <td>String starting from {{kbd | key=<nowiki>0x</nowiki>}} symbols</td> <td style="word-wrap: break-word;">{{kbd | key=<nowiki>0x8c61</nowiki>}}</td> <td>hexadecimal string<ref>[https://www.programiz.com/python-programming/methods/built-in/hex Python hex() - Python Standard Library]</ref></td> <td></td> </tr> <tr> <td>String starting from {{kbd | key=<nowiki>\x</nowiki>}} symbols</td> <td style="word-wrap: break-word;">{{kbd | key=<nowiki>\xe8\xa8\xb1</nowiki>}}</td> <td>"\x is a string escape code, which happens to use hex notation" (hexadecimal notation)<ref>[https://stackoverflow.com/questions/13123877/difference-between-different-hex-types-representations-in-python Difference between different hex types/representations in Python - Stack Overflow]</ref></td> <td>hexadecimal to text ↔ text to hexadecimal</td> </tr> <tr> <td>String starting from {{kbd | key=<nowiki>&#</nowiki>}} symbols</td> <td style="word-wrap: break-word;">{{kbd | key=<nowiki>&#35937;</nowiki>}}</td> <td>Unicode HTML code. "Unicode number in decimal, hex or octal"<ref>[http://www.amp-what.com/help.html &what Help]</ref></td> <td>[https://www.php.net/manual/en/function.html-entity-decode.php PHP: html_entity_decode] ↔ (See the following section to understand how to encode)</td> </tr> <tr> <td>HTML source code starting from {{kbd | key=<nowiki>& ... ;</nowiki>}} symbols</td> <td style="word-wrap: break-word;">{{kbd | key=<nowiki>& a m p ;</nowiki>}} (without whitespace) is {{kbd | key=<nowiki>&</nowiki>}}</td> <td>"all characters which have HTML character entity equivalents are translated into these entities"<ref>[https://www.php.net/manual/en/function.htmlentities.php PHP: htmlentities - Manual]</ref></td> <td>[https://www.php.net/manual/en/function.htmlspecialchars-decode.php PHP: htmlspecialchars_decode] ↔ [https://www.php.net/manual/en/function.htmlentities.php PHP: htmlentities]</td> </tr> </table> Possible approaches to encode the message text: <table border="1" style="width: 100%; table-layout: fixed;" class="wikitable sortable"> <tr> <th style="width: 20%;"> Approach </th> <th style="width: 25%"> Goal </th> <th style="width: 20%"> Is Chinese text garbled/encoded? </th> <th style="width: 35%;"> Sample text before encoded or after encoded </th> </tr> <tr> <th> [https://www.w3schools.com/jsref/jsref_encodeURIComponent.asp JavaScript encodeURIComponent()] <br />↔<br /> [http://www.w3schools.com/jsref/jsref_decodeuricomponent.asp JavaScript decodeURIComponent()]<ref>[http://stackoverflow.com/questions/9901027/how-to-encode-url-contains-unicode-characters-with-php urlencode - How to Encode URL Contains Unicode Characters with PHP - Stack Overflow]</ref> </th> <td> "converts characters into a format that can be transmitted over the Internet ... " Cited from [http://www.w3schools.com/tags/ref_urlencode.asp w3schools] </td> <td> TRUE </td> <td style="word-wrap: break-word;"> <ul><li>before: {{kbd | key=<nowiki>http://www.中文網址.tw/my test.asp?name=ståle&car=saab</nowiki>}} <li>after: {{kbd | key=<nowiki>http%3A%2F%2Fwww.%E4%B8%AD%E6%96%87%E7%B6%B2%E5%9D%80.tw%2Fmy%20test.asp%3Fname%3Dst%C3%A5le%26car%3Dsaab</nowiki>}} </ul> </td> </tr> <tr> <th> [http://meyerweb.com/eric/tools/dencoder/ URL Decoder/Encoder]<ref>PHP [http://php.net/manual/en/function.urlencode.php urlencode()]</ref> </th> <td> (same as above) </td> <td> TRUE </td> <td style="word-wrap: break-word;"> (same as above) </td> </tr> <tr> <th> [http://php.net/manual/en/function.json-encode.php PHP: json_encode]<br />↔<br />[http://php.net/manual/en/function.json-decode.php PHP: json_decode] </th> <td>Save array in mysql database </td> <td> TRUE </td> <td style="word-wrap: break-word;"> <ul><li>before: {{kbd | key=<nowiki>array("作者" => "馬克吐溫", "名言" => "\"To a man with a hammer, everything looks like a nail.\" He said.");</nowiki>}} <li>after: {{kbd | key=<nowiki>{"\u4f5c\u8005":"\u99ac\u514b\u5410\u6eab","\u540d\u8a00":"\"To a man with a hammer, everything looks like a nail.\" He said."}</nowiki>}}</ul> </td> </tr> <tr> <th> [http://php.net/serialize PHP: serialize] <br />↔<br /> [http://php.net/manual/en/function.unserialize.php PHP: unserialize] </th> <td>[http://stackoverflow.com/questions/10686333/save-array-in-mysql-database Save array in mysql database] </td> <td> <span style="color: #999">FALSE</span> </td> <td style="word-wrap: break-word;"> <ul><li>before: {{kbd | key=<nowiki>array("作者" => "馬克吐溫", "名言" => "\"To a man with a hammer, everything looks like a nail.\" He said.");</nowiki>}} <li>after: {{kbd | key=<nowiki>a:2:{s:6:"作者";s:12:"馬克吐溫";s:6:"名言";s:64:""To a man with a hammer, everything looks like a nail." He said.";}</nowiki>}}</ul> </td> </tr> <tr> <th> [http://php.net/manual/en/function.htmlentities.php PHP: htmlentities][http://www.w3schools.com/html/html_entities.asp] <br />↔<br /> [http://php.net/manual/en/function.html-entity-decode.php PHP: html_entity_decode] </th> <td> Replace reserved characters e.g. double quote symbol </td> <td> <span style="color: #999">FALSE</span> </td> <td style="word-wrap: break-word;"> <ul><li>before: {{kbd | key=<nowiki>馬克吐溫名言 "To a man with a hammer, everything looks like a nail."</nowiki>}} <li>after: {{kbd | key=<nowiki> 馬克吐溫名言 &quot;To a man with a hammer, everything looks like a nail.&quot;</nowiki>}}</ul> </td> </tr> </table> Other functions * [https://www.w3schools.com/js/js_json_parse.asp JSON.parse()] or [http://api.jquery.com/jquery.parsejson/ jQuery.parseJSON() | jQuery API Documentation] === String contains {{kbd | key=<nowiki>%2</nowiki>}} or {{kbd | key=<nowiki>%20</nowiki>}} symbols === Using the following functions * PHP [http://php.net/manual/en/function.urlencode.php urlencode] * JavaScript [https://www.w3schools.com/jsref/jsref_encodeuri.asp encodeURI() Function] * Excel [https://support.microsoft.com/en-us/office/encodeurl-function-07c7fb90-7c60-4bff-8687-fac50fe33d0e ENCODEURL function] === String starting from \u, \U or U+ symbol === Using PHP. Type is string <pre> $encoded = <<<EOT "\u8c61" EOT; echo "decoded string: " . json_decode($encoded, true) . PHP_EOL; // print 象 echo "encoded string: " . json_encode("象") . PHP_EOL; // print "\u8c61" $encoded = <<<EOT "\ud83d\udc18" EOT; echo "decoded string: " . json_decode($encoded, true) . PHP_EOL; // print 🐘 echo "encoded string: " . json_encode("🐘") . PHP_EOL; // print "\ud83d\udc18" </pre> when using the heredoc syntax (<<<EOT ... EOT;), it's possible that unnecessary whitespace or hidden characters at the beginning or end of the block might cause json_decode to fail in parsing the string correctly. Direct assignment avoids potential whitespace or format issues from heredoc. <pre> $encoded = '"\u8c61"'; echo "decoded string: " . json_decode($encoded, true) . PHP_EOL; // print 象 echo "encoded string: " . json_encode("象") . PHP_EOL; // print "\u8c61" </pre> Using PHP v. 7.0 [https://wiki.php.net/rfc/unicode_escape Unicode Codepoint Escape Syntax]<ref>[https://secure.php.net/manual/en/migration70.new-features.php#migration70.new-features.unicode-codepoint-escape-syntax PHP: New features - Manual]</ref> <pre> echo "\u{8c61}" . PHP_EOL; // print 象 echo "\u{0001f418}" . PHP_EOL; // print 🐘 </pre> Using Python. Type is string <pre> x = u'象' x.encode('ascii', 'backslashreplace') # print b'\\u8c61' x = u'🐘' x.encode('ascii', 'backslashreplace') # print b'\\U0001f418' </pre> Using PHP. Type is array <pre> $input = <<<EOT ["\u8c61"] EOT; $input = trim($input); var_dump(json_decode($input, true)); // print array("象") var_dump(json_encode(array("象")); // print ["\u8c61"] </pre> === String starting from 0x symbol === Using Python [https://www.w3schools.com/python/ref_func_chr.asp chr() Function] ↔ [https://www.programiz.com/python-programming/methods/built-in/hex hex() function] <pre> int('0x8c61', 16) # print 35937 -- "An integer representing a valid Unicode code point" cited from w3schools chr(int('0x8c61', 16)) # print '象' -- "returns the character that represents the specified unicode." cited from w3schools hex(ord('象')) # print '0x8c61' -- "converts an integer number to the corresponding hexadecimal string." cited from programiz.com chr(int('0x1f418', 16)) # print '🐘' hex(ord('🐘')) # print '0x1f418' </pre> === string starting from \x symbol === Using Python<ref>[https://docs.python.org/3/library/stdtypes.html#bytes.decode bytes.decode()]</ref><ref>[https://docs.python.org/3/library/stdtypes.html#str.encode str.encode()]</ref><ref>[https://stackoverflow.com/questions/33294213/how-to-decode-unicode-in-a-chinese-text python - How to decode unicode in a Chinese text - Stack Overflow]</ref> <pre> data = u"象" data hex_notation = data.encode('utf-8') hex_notation # print b'\xe8\xb1\xa1' for each_unicode_character in hex_notation.decode('utf-8'): print(each_unicode_character) data = u"🐘" data hex_notation = data.encode('utf-8') hex_notation # print b'\xf0\x9f\x90\x98' for each_unicode_character in hex_notation.decode('utf-8'): print(each_unicode_character) data = u"だいじょうぶ" data hex_notation = data.encode('utf-8') hex_notation # print b'\xe3\x81\xa0\xe3\x81\x84\xe3\x81\x98\xe3\x82\x87\xe3\x81\x86\xe3\x81\xb6' for each_unicode_character in hex_notation.decode('utf-8'): print(each_unicode_character) </pre> Using PHP<ref>[https://stackoverflow.com/questions/7320516/how-to-convert-text-to-x-codes php - How to convert text to \x codes? - Stack Overflow]</ref>: [https://www.ideone.com/m58rEZ See it in action] <pre> echo preg_replace_callback("/./", function($matched) { return '\x'.dechex(ord($matched[0])); }, '🐘'); # print \xf0\x9f\x90\x98 </pre> === String starting from &# symbols === Using PHP [https://www.w3schools.com/php/func_string_html_entity_decode.asp html_entity_decode() Function]<ref>[https://blog.longwin.com.tw/2011/06/php-html-unicode-convert-2011/ PHP 將 文字 轉換成 &#xxxxx; UNICODE 碼 | Tsung's Blog]</ref><ref>[http://hinablue.blogspot.com/2008/01/php-tech-unicode-html-convert.html [php tech.] unicode html convert | HINA::工程幼稚園] unicode html 字碼來元是由原本的編碼,轉換為 UCS-2 之後,再取二進制轉換,再取一次 16 to 10 進制轉換,在加上 &# 而得到這個字碼。</ref> To decode the text <pre> $unicode_html = '&#128024;'; echo html_entity_decode($unicode_html) . PHP_EOL; // print 🐘 $unicode_html = '&#128024;'; echo mb_convert_encoding($unicode_html, 'UTF-8', 'HTML-ENTITIES') . PHP_EOL; // print 🐘 </pre> To encode the text <pre> $input = "🐘"; $unicode_html = base_convert(bin2hex(mb_convert_encoding($input, 'UTF-32', 'utf-8')), 16, 10); $unicode_html = '&#' . $unicode_html . ';'; echo 'unicode_html: ' . $unicode_html . PHP_EOL; // print 🐘 </pre>
Summary:
Please note that all contributions to LemonWiki共筆 are considered to be released under the Creative Commons Attribution-NonCommercial-ShareAlike (see
LemonWiki:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Navigation menu
Personal tools
Not logged in
Talk
Contributions
Log in
Namespaces
Page
Discussion
English
Views
Read
Edit
View history
More
Search
Navigation
Main page
Current events
Recent changes
Random page
Help
Categories
Tools
What links here
Related changes
Special pages
Page information