Fix garbled message text
Jump to navigation
Jump to search
Ideas on how to fix garbled message text
- Possible cause
- Encoding issue: Choose the correct the language/encode of message text or auto detect the encode by tools
- PHP utf8_encode() & utf8_decode()
- (optional) convert the current encode to UTF-8
- (optional) Making text wrap to window size
Possible approaches to encode the message text:
Approach | Goal | Is Chinese text garbled/encoded? | Sample text before encoded or after encoded |
---|---|---|---|
JavaScript encodeURIComponent() ↔ JavaScript decodeURIComponent()[1] |
"converts characters into a format that can be transmitted over the Internet ... " Cited from w3schools | TRUE |
|
URL Decoder/Encoder[2] | (same as above) | TRUE | (same as above) |
PHP: json_encode ↔ PHP: json_decode |
Save array in mysql database | TRUE |
|
PHP: serialize ↔ PHP: unserialize |
Save array in mysql database | FALSE |
|
PHP: htmlentities[1] ↔ PHP: html_entity_decode |
Replace reserved characters e.g. double quote symbol | FALSE |
|
Other functions
List of the garbled text and possible cause
Feature | Example | Meaning | Restore to human readable ↔ encode text |
---|---|---|---|
Website address contains %2 symbols | http%3A%2F%2Fwww.%E4%B8%AD%E6%96%87%E7%B6%B2%E5%9D%80.tw%2F | "converts characters into a format that can be transmitted over the Internet ... " Cited from w3schools | URL decode ↔ URL eocode |
Downloaded Json or JavaScript file which its content contains \u symbols | \u4f5c | (1) 16-bit or 32-bit hex value (2) "JSON representation of the supplied value"[3][4] | JSON decode ↔ JSON eocode |
String contains \x symbols | b'\xe8\xa8\xb1' | "\x is a string escape code, which happens to use hex notation" (hexadecimal notation)[5] | hexadecimal to text ↔ text to hexadecimal |
text starting from \u symbol
Using PHP. Type is string
$encoded = json_encode("象"); echo "encoded string: " . $encoded . PHP_EOL; // print "\u8c61" echo "decoded string: " . json_decode($encoded, true) . PHP_EOL; // print 象 $encoded = json_encode("🐘"); echo "encoded string: " . $encoded . PHP_EOL; // print "\ud83d\udc18" echo "decoded string: " . json_decode($encoded, true) . PHP_EOL; // print 🐘
Using Python. Type is string
x = u'象' x.encode('ascii', 'backslashreplace') # print b'\\u8c61' x = u'🐘' x.encode('ascii', 'backslashreplace') # print b'\\U0001f418'
Using PHP. Type is array
$input = <<<EOT ["\u8c61"] EOT; $input = trim($input); var_dump(json_decode($input, true)); // print array("象") var_dump(json_encode(array("象")); // print ["\u8c61"]
text starting from \x symbol
data = u"象" data hex_notation = data.encode('utf-8') hex_notation # print b'\xe8\xb1\xa1' for each_unicode_character in hex_notation.decode('utf-8'): print(each_unicode_character) data = u"🐘" data hex_notation = data.encode('utf-8') hex_notation # print b'\xf0\x9f\x90\x98' for each_unicode_character in hex_notation.decode('utf-8'): print(each_unicode_character) data = u"だいじょうぶ" data hex_notation = data.encode('utf-8') hex_notation # print b'\xe3\x81\xa0\xe3\x81\x84\xe3\x81\x98\xe3\x82\x87\xe3\x81\x86\xe3\x81\xb6' for each_unicode_character in hex_notation.decode('utf-8'): print(each_unicode_character)
Ways to fix garbled message text
ConvertZ v.8.02
- choose encode: manually (mainly in Asia language)
- convert to UTF-8: available
- convert to big5 from UTF-8: available the wording may be changed by the software ex: 余美人 -> 於美人
- allow to wrap long text: available
EmEditor v.14.3.1 ($)
- choose encode: manually and auto-detect
- convert to UTF-8: available
- allow to wrap long text: available
- support command line: EmEditor FAQ: How can I convert file encodings with the command line?
Google Chrome v.10 (viewer)
- choose encode: manually and auto-detect
- allow to wrap long text: available (auto)
MadEdit v.0.2.9.1
- choose encode: manually and auto-detect
- convert to UTF-8: available
- allow to wrap long text: available
Microsoft Internet Explorer v.8 (viewer)
- choose encode: manually and auto-detect
- allow to wrap long text:
Microsoft notepad (記事本) for Windows
method 1: Err: 解決用記事本(notepad)開啟簡體字txt檔,出現亂碼的問題(2010): notepad + Notepad++
- choose encode: manually
- convert to UTF-8: available by Notepad++
- allow to wrap long text: available
method 2: Microsoft AppLocale 公用程式(patched: piaip pAppLocale) + notepad
- choose encode: manually
- convert to UTF-8: not available
- allow to wrap long text: available
Microsoft Office Word 2003 ($)
- choose encode: manually
- convert to UTF-8: available
- allow to wrap long text: available
Mozilla Firefox v.3.6 (viewer)
- choose encode: manually and auto-detect
- allow to wrap long text: no but you can copy the following code into the web address bar to wrap long text (Thanks, Return of the Sasquatch: word wrap for Firefox bookmarklet!)
javascript:(function() { var D = document; F(D.body); function F(n) { var u, r, c, x; if (n.nodeType == 3) { u = n.data.search(/\S{45}/); if (u >= 0) { r = n.splitText(u + 45); n.parentNode.insertBefore(D.createElement('wbr'), r); } } else if ((n.tagName != 'STYLE') && (n.tagName != 'SCRIPT')) { for (c = 0; x = n.childNodes[c]; ++c) { F(x); } } } D.body.innerHTML += ' '; })();
Notepad++ v.5.8
- choose encode: manually
- convert to UTF-8: available
- allow to wrap long text: available
not supported at this moment
- LibreOffice 3.3.0 - Writer
- OpenOffice.org 3.3.0 - Writer is not supported but OpenOffice.org Calc is supported.
Further reading
- 簡繁體文件轉換
- Character encoding - Wikipedia, the free encyclopedia
- Regular extract url from text
- URL Encoding
References
- ↑ urlencode - How to Encode URL Contains Unicode Characters with PHP - Stack Overflow
- ↑ PHP urlencode()
- ↑ PHP: json_encode - Manual
- ↑ RFC 7159: The JavaScript Object Notation (JSON) Data Interchange Format
- ↑ Difference between different hex types/representations in Python - Stack Overflow
- ↑ bytes.decode()
- ↑ str.encode()
- ↑ python - How to decode unicode in a Chinese text - Stack Overflow