Editing
Fix garbled message text
Jump to navigation
Jump to search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== How to fix garbled message text == Ideas on how to fix garbled message text # Possible cause #* Encoding issue: Choose the correct the language/encode of message text or auto detect the encode by tools #* PHP [http://php.net/manual/en/function.utf8-encode.php utf8_encode()] & [http://php.net/manual/en/function.utf8-decode.php utf8_decode()] # (optional) convert the current encode to UTF-8 # (optional) Making text wrap to window size List of the (look like but not) garbled text and possible root cause <table border="1" style="width: 100%; table-layout: fixed;" class="wikitable sortable"> <tr> <th>Feature</th> <th>Example</th> <th>Meaning</th> <th>Restore to human readable ↔ encode text</th> </tr> <tr> <td>String contains {{kbd | key=<nowiki>%2</nowiki>}} or {{kbd | key=<nowiki>%20</nowiki>}} symbols and meaningfulness English characters</td> <td style="word-wrap: break-word;">{{kbd | key=<nowiki>http%3A%2F%2Fwww.%E4%B8%AD%E6%96%87%E7%B6%B2%E5%9D%80.tw%2F</nowiki>}}</td> <td>"converts characters into a format that can be transmitted over the Internet ... " Cited from [http://www.w3schools.com/tags/ref_urlencode.asp w3schools]</td> <td>URL decode ↔ URL eocode</td> </tr> <tr> <td>String start from {{kbd | key=<nowiki>\u</nowiki>}}, {{kbd | key=<nowiki>\U</nowiki>}} or {{kbd | key=<nowiki>U+</nowiki>}} symbols</td> <td style="word-wrap: break-word;">{{kbd | key=<nowiki>\u8c61</nowiki>}}, {{kbd | key=<nowiki>\U0001f418</nowiki>}} or {{kbd | key=<nowiki>U+1F418</nowiki>}}</td> <td>Unicode number: "Unicode code point is referred to by writing "U+" followed by its hexadecimal number.<ref>[https://en.wikipedia.org/wiki/Unicode Unicode - Wikipedia]</ref>" (1) 16-bit or 32-bit hex value (2) "JSON representation of the supplied value"<ref>[http://php.net/manual/en/function.json-encode.php PHP: json_encode - Manual]</ref><ref>[http://www.faqs.org/rfcs/rfc7159.html RFC 7159: The JavaScript Object Notation (JSON) Data Interchange Format]</ref></td> <td>JSON decode ↔ JSON eocode</td> </tr> <tr> <td>String starting from {{kbd | key=<nowiki>0x</nowiki>}} symbols</td> <td style="word-wrap: break-word;">{{kbd | key=<nowiki>0x8c61</nowiki>}}</td> <td>hexadecimal string<ref>[https://www.programiz.com/python-programming/methods/built-in/hex Python hex() - Python Standard Library]</ref></td> <td></td> </tr> <tr> <td>String starting from {{kbd | key=<nowiki>\x</nowiki>}} symbols</td> <td style="word-wrap: break-word;">{{kbd | key=<nowiki>\xe8\xa8\xb1</nowiki>}}</td> <td>"\x is a string escape code, which happens to use hex notation" (hexadecimal notation)<ref>[https://stackoverflow.com/questions/13123877/difference-between-different-hex-types-representations-in-python Difference between different hex types/representations in Python - Stack Overflow]</ref></td> <td>hexadecimal to text ↔ text to hexadecimal</td> </tr> <tr> <td>String starting from {{kbd | key=<nowiki>&#</nowiki>}} symbols</td> <td style="word-wrap: break-word;">{{kbd | key=<nowiki>&#35937;</nowiki>}}</td> <td>Unicode HTML code. "Unicode number in decimal, hex or octal"<ref>[http://www.amp-what.com/help.html &what Help]</ref></td> <td>[https://www.php.net/manual/en/function.html-entity-decode.php PHP: html_entity_decode] ↔ (See the following section to understand how to encode)</td> </tr> <tr> <td>HTML source code starting from {{kbd | key=<nowiki>& ... ;</nowiki>}} symbols</td> <td style="word-wrap: break-word;">{{kbd | key=<nowiki>& a m p ;</nowiki>}} (without whitespace) is {{kbd | key=<nowiki>&</nowiki>}}</td> <td>"all characters which have HTML character entity equivalents are translated into these entities"<ref>[https://www.php.net/manual/en/function.htmlentities.php PHP: htmlentities - Manual]</ref></td> <td>[https://www.php.net/manual/en/function.htmlspecialchars-decode.php PHP: htmlspecialchars_decode] ↔ [https://www.php.net/manual/en/function.htmlentities.php PHP: htmlentities]</td> </tr> </table> Possible approaches to encode the message text: <table border="1" style="width: 100%; table-layout: fixed;" class="wikitable sortable"> <tr> <th style="width: 20%;"> Approach </th> <th style="width: 25%"> Goal </th> <th style="width: 20%"> Is Chinese text garbled/encoded? </th> <th style="width: 35%;"> Sample text before encoded or after encoded </th> </tr> <tr> <th> [https://www.w3schools.com/jsref/jsref_encodeURIComponent.asp JavaScript encodeURIComponent()] <br />↔<br /> [http://www.w3schools.com/jsref/jsref_decodeuricomponent.asp JavaScript decodeURIComponent()]<ref>[http://stackoverflow.com/questions/9901027/how-to-encode-url-contains-unicode-characters-with-php urlencode - How to Encode URL Contains Unicode Characters with PHP - Stack Overflow]</ref> </th> <td> "converts characters into a format that can be transmitted over the Internet ... " Cited from [http://www.w3schools.com/tags/ref_urlencode.asp w3schools] </td> <td> TRUE </td> <td style="word-wrap: break-word;"> <ul><li>before: {{kbd | key=<nowiki>http://www.中文網址.tw/my test.asp?name=ståle&car=saab</nowiki>}} <li>after: {{kbd | key=<nowiki>http%3A%2F%2Fwww.%E4%B8%AD%E6%96%87%E7%B6%B2%E5%9D%80.tw%2Fmy%20test.asp%3Fname%3Dst%C3%A5le%26car%3Dsaab</nowiki>}} </ul> </td> </tr> <tr> <th> [http://meyerweb.com/eric/tools/dencoder/ URL Decoder/Encoder]<ref>PHP [http://php.net/manual/en/function.urlencode.php urlencode()]</ref> </th> <td> (same as above) </td> <td> TRUE </td> <td style="word-wrap: break-word;"> (same as above) </td> </tr> <tr> <th> [http://php.net/manual/en/function.json-encode.php PHP: json_encode]<br />↔<br />[http://php.net/manual/en/function.json-decode.php PHP: json_decode] </th> <td>Save array in mysql database </td> <td> TRUE </td> <td style="word-wrap: break-word;"> <ul><li>before: {{kbd | key=<nowiki>array("作者" => "馬克吐溫", "名言" => "\"To a man with a hammer, everything looks like a nail.\" He said.");</nowiki>}} <li>after: {{kbd | key=<nowiki>{"\u4f5c\u8005":"\u99ac\u514b\u5410\u6eab","\u540d\u8a00":"\"To a man with a hammer, everything looks like a nail.\" He said."}</nowiki>}}</ul> </td> </tr> <tr> <th> [http://php.net/serialize PHP: serialize] <br />↔<br /> [http://php.net/manual/en/function.unserialize.php PHP: unserialize] </th> <td>[http://stackoverflow.com/questions/10686333/save-array-in-mysql-database Save array in mysql database] </td> <td> <span style="color: #999">FALSE</span> </td> <td style="word-wrap: break-word;"> <ul><li>before: {{kbd | key=<nowiki>array("作者" => "馬克吐溫", "名言" => "\"To a man with a hammer, everything looks like a nail.\" He said.");</nowiki>}} <li>after: {{kbd | key=<nowiki>a:2:{s:6:"作者";s:12:"馬克吐溫";s:6:"名言";s:64:""To a man with a hammer, everything looks like a nail." He said.";}</nowiki>}}</ul> </td> </tr> <tr> <th> [http://php.net/manual/en/function.htmlentities.php PHP: htmlentities][http://www.w3schools.com/html/html_entities.asp] <br />↔<br /> [http://php.net/manual/en/function.html-entity-decode.php PHP: html_entity_decode] </th> <td> Replace reserved characters e.g. double quote symbol </td> <td> <span style="color: #999">FALSE</span> </td> <td style="word-wrap: break-word;"> <ul><li>before: {{kbd | key=<nowiki>馬克吐溫名言 "To a man with a hammer, everything looks like a nail."</nowiki>}} <li>after: {{kbd | key=<nowiki> 馬克吐溫名言 &quot;To a man with a hammer, everything looks like a nail.&quot;</nowiki>}}</ul> </td> </tr> </table> Other functions * [https://www.w3schools.com/js/js_json_parse.asp JSON.parse()] or [http://api.jquery.com/jquery.parsejson/ jQuery.parseJSON() | jQuery API Documentation] === String contains {{kbd | key=<nowiki>%2</nowiki>}} or {{kbd | key=<nowiki>%20</nowiki>}} symbols === Using the following functions * PHP [http://php.net/manual/en/function.urlencode.php urlencode] * JavaScript [https://www.w3schools.com/jsref/jsref_encodeuri.asp encodeURI() Function] * Excel [https://support.microsoft.com/en-us/office/encodeurl-function-07c7fb90-7c60-4bff-8687-fac50fe33d0e ENCODEURL function] === String starting from \u, \U or U+ symbol === Using PHP. Type is string <pre> $encoded = <<<EOT "\u8c61" EOT; echo "decoded string: " . json_decode($encoded, true) . PHP_EOL; // print 象 echo "encoded string: " . json_encode("象") . PHP_EOL; // print "\u8c61" $encoded = <<<EOT "\ud83d\udc18" EOT; echo "decoded string: " . json_decode($encoded, true) . PHP_EOL; // print 🐘 echo "encoded string: " . json_encode("🐘") . PHP_EOL; // print "\ud83d\udc18" </pre> when using the heredoc syntax (<<<EOT ... EOT;), it's possible that unnecessary whitespace or hidden characters at the beginning or end of the block might cause json_decode to fail in parsing the string correctly. Direct assignment avoids potential whitespace or format issues from heredoc. <pre> $encoded = '"\u8c61"'; echo "decoded string: " . json_decode($encoded, true) . PHP_EOL; // print 象 echo "encoded string: " . json_encode("象") . PHP_EOL; // print "\u8c61" </pre> Using PHP v. 7.0 [https://wiki.php.net/rfc/unicode_escape Unicode Codepoint Escape Syntax]<ref>[https://secure.php.net/manual/en/migration70.new-features.php#migration70.new-features.unicode-codepoint-escape-syntax PHP: New features - Manual]</ref> <pre> echo "\u{8c61}" . PHP_EOL; // print 象 echo "\u{0001f418}" . PHP_EOL; // print 🐘 </pre> Using Python. Type is string <pre> x = u'象' x.encode('ascii', 'backslashreplace') # print b'\\u8c61' x = u'🐘' x.encode('ascii', 'backslashreplace') # print b'\\U0001f418' </pre> Using PHP. Type is array <pre> $input = <<<EOT ["\u8c61"] EOT; $input = trim($input); var_dump(json_decode($input, true)); // print array("象") var_dump(json_encode(array("象")); // print ["\u8c61"] </pre> === String starting from 0x symbol === Using Python [https://www.w3schools.com/python/ref_func_chr.asp chr() Function] ↔ [https://www.programiz.com/python-programming/methods/built-in/hex hex() function] <pre> int('0x8c61', 16) # print 35937 -- "An integer representing a valid Unicode code point" cited from w3schools chr(int('0x8c61', 16)) # print '象' -- "returns the character that represents the specified unicode." cited from w3schools hex(ord('象')) # print '0x8c61' -- "converts an integer number to the corresponding hexadecimal string." cited from programiz.com chr(int('0x1f418', 16)) # print '🐘' hex(ord('🐘')) # print '0x1f418' </pre> === string starting from \x symbol === Using Python<ref>[https://docs.python.org/3/library/stdtypes.html#bytes.decode bytes.decode()]</ref><ref>[https://docs.python.org/3/library/stdtypes.html#str.encode str.encode()]</ref><ref>[https://stackoverflow.com/questions/33294213/how-to-decode-unicode-in-a-chinese-text python - How to decode unicode in a Chinese text - Stack Overflow]</ref> <pre> data = u"象" data hex_notation = data.encode('utf-8') hex_notation # print b'\xe8\xb1\xa1' for each_unicode_character in hex_notation.decode('utf-8'): print(each_unicode_character) data = u"🐘" data hex_notation = data.encode('utf-8') hex_notation # print b'\xf0\x9f\x90\x98' for each_unicode_character in hex_notation.decode('utf-8'): print(each_unicode_character) data = u"だいじょうぶ" data hex_notation = data.encode('utf-8') hex_notation # print b'\xe3\x81\xa0\xe3\x81\x84\xe3\x81\x98\xe3\x82\x87\xe3\x81\x86\xe3\x81\xb6' for each_unicode_character in hex_notation.decode('utf-8'): print(each_unicode_character) </pre> Using PHP<ref>[https://stackoverflow.com/questions/7320516/how-to-convert-text-to-x-codes php - How to convert text to \x codes? - Stack Overflow]</ref>: [https://www.ideone.com/m58rEZ See it in action] <pre> echo preg_replace_callback("/./", function($matched) { return '\x'.dechex(ord($matched[0])); }, '🐘'); # print \xf0\x9f\x90\x98 </pre> === String starting from &# symbols === Using PHP [https://www.w3schools.com/php/func_string_html_entity_decode.asp html_entity_decode() Function]<ref>[https://blog.longwin.com.tw/2011/06/php-html-unicode-convert-2011/ PHP 將 文字 轉換成 &#xxxxx; UNICODE 碼 | Tsung's Blog]</ref><ref>[http://hinablue.blogspot.com/2008/01/php-tech-unicode-html-convert.html [php tech.] unicode html convert | HINA::工程幼稚園] unicode html 字碼來元是由原本的編碼,轉換為 UCS-2 之後,再取二進制轉換,再取一次 16 to 10 進制轉換,在加上 &# 而得到這個字碼。</ref> To decode the text <pre> $unicode_html = '&#128024;'; echo html_entity_decode($unicode_html) . PHP_EOL; // print 🐘 $unicode_html = '&#128024;'; echo mb_convert_encoding($unicode_html, 'UTF-8', 'HTML-ENTITIES') . PHP_EOL; // print 🐘 </pre> To encode the text <pre> $input = "🐘"; $unicode_html = base_convert(bin2hex(mb_convert_encoding($input, 'UTF-32', 'utf-8')), 16, 10); $unicode_html = '&#' . $unicode_html . ';'; echo 'unicode_html: ' . $unicode_html . PHP_EOL; // print 🐘 </pre> == Ways to fix garbled message text == === [http://www.softking.com.tw/soft/clickcount.asp?fid3=1763 ConvertZ] v.8.02 === * choose encode: manually (mainly in Asia language) * convert to UTF-8: available * convert to big5 from UTF-8: available {{exclaim}} the wording may be changed by the software ex: 余美人 -> 於美人 * allow to wrap long text: available === [http://www.emeditor.com/ EmEditor] v.14.3.1 ($) === * choose encode: manually and auto-detect {{Gd}} * convert to UTF-8: available * allow to wrap long text: available * support command line: [https://www.emeditor.com/help/faq/file/file_convert.htm EmEditor FAQ: How can I convert file encodings with the command line?] === [http://www.google.com/chrome Google Chrome] v.10 (viewer) === * choose encode: manually and auto-detect * allow to wrap long text: available (auto) {{Gd}} === [http://sourceforge.net/projects/madedit/ MadEdit] v.0.2.9.1 === * choose encode: manually and auto-detect {{Gd}} * convert to UTF-8: available * allow to wrap long text: available === Microsoft Internet Explorer v.8 (viewer) === * choose encode: manually and auto-detect * allow to wrap long text: === Microsoft notepad (記事本) for Windows === method 1: [https://errerrors.blogspot.com/2010/11/notepadtxt.html Err: 解決用記事本(notepad)開啟簡體字txt檔,出現亂碼的問題](2010): notepad + [http://notepad-plus-plus.org/ Notepad++ ] * choose encode: manually * convert to UTF-8: available by Notepad++ * allow to wrap long text: available method 2: [http://www.microsoft.com/downloads/details.aspx?FamilyID=8c4e8e0d-45d1-4d9b-b7c0-8430c1ac89ab&displayLang=zh-tw Microsoft AppLocale 公用程式](patched: [http://ntu.csie.org/~piaip/papploc.msi piaip pAppLocale]) + notepad * choose encode: manually * convert to UTF-8: not available * allow to wrap long text: available === Microsoft Office Word 2003 ($) === * choose encode: manually * convert to UTF-8: available * allow to wrap long text: available === [http://moztw.org/firefox/ Mozilla Firefox] v.3.6 (viewer) === * choose encode: manually and auto-detect * allow to wrap long text: no but you can copy the following code into the web address bar to wrap long text (Thanks, [http://returnofthesasquatch.blogspot.com/2007/03/word-wrap-for-firefox-bookmarklet_17.html Return of the Sasquatch: word wrap for Firefox bookmarklet]!) <pre> javascript:(function() { var D = document; F(D.body); function F(n) { var u, r, c, x; if (n.nodeType == 3) { u = n.data.search(/\S{45}/); if (u >= 0) { r = n.splitText(u + 45); n.parentNode.insertBefore(D.createElement('wbr'), r); } } else if ((n.tagName != 'STYLE') && (n.tagName != 'SCRIPT')) { for (c = 0; x = n.childNodes[c]; ++c) { F(x); } } } D.body.innerHTML += ' '; })(); </pre> === [http://notepad-plus-plus.org/ Notepad++] v.5.8 === * choose encode: manually * convert to UTF-8: available * allow to wrap long text: available === not supported at this moment === * [http://www.libreoffice.org/ LibreOffice] 3.3.0 - Writer * [http://www.openoffice.org/ OpenOffice.org] 3.3.0 - Writer is not supported but OpenOffice.org Calc is supported. == Further reading == * [[Batch Process#簡繁體文件轉換 | 簡繁體文件轉換]] * [http://en.wikipedia.org/wiki/Character_encoding Character encoding - Wikipedia, the free encyclopedia] * [https://pjchender.blogspot.com/2018/06/guide-unicode-javascript.html (Guide) 瞭解網頁中看不懂的編碼:Unicode 在 JavaScript 中的使用 ~ PJCHENder 那些沒告訴你的小細節] * [https://www.multiutil.com/base64-to-text-converter/ Base64 to Text Converter] * [https://www.multiutil.com/gzip-to-text-decompress/ Gzip to Text Decompress using gzip, deflate and brotli algoithms] * [[URL Encoding]] Unicode table * [https://unicode-table.com/en/ Unicode® Character Table] * [http://www.amp-what.com/ &what: Discover Unicode & HTML Character Entities] * [https://www.toptal.com/designers/htmlarrows/ HTML Symbols, Entities, Characters and Codes — HTML Arrows] * [https://www.utf8-chartable.de/unicode-utf8-table.pl?utf8=0x Unicode/UTF-8-character table] == References == <references /> [[Category:Software]] [[Category:Data Science]] [[Category:String manipulation]] [[Category:Programming]]
Summary:
Please note that all contributions to LemonWiki共筆 are considered to be released under the Creative Commons Attribution-NonCommercial-ShareAlike (see
LemonWiki:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Templates used on this page:
Template:Exclaim
(
edit
)
Template:Gd
(
edit
)
Template:Kbd
(
edit
)
Navigation menu
Personal tools
Not logged in
Talk
Contributions
Log in
Namespaces
Page
Discussion
English
Views
Read
Edit
View history
More
Search
Navigation
Main page
Current events
Recent changes
Random page
Help
Categories
Tools
What links here
Related changes
Special pages
Page information