Fix garbled message text: Difference between revisions

Revision as of 08:55, 2 February 2019

Ideas on how to fix garbled message text

Possible cause
- Encoding issue: Choose the correct the language/encode of message text or auto detect the encode by tools
- PHP utf8_encode() & utf8_decode()
(optional) convert the current encode to UTF-8
(optional) Making text wrap to window size

Possible approaches to encode the message text:

Approach	Goal	Is Chinese text garbled/encoded?	Sample text before encoded or after encoded
JavaScript encodeURIComponent() ↔ JavaScript decodeURIComponent()^[1]	"converts characters into a format that can be transmitted over the Internet ... " Cited from w3schools	TRUE	before: http://www.中文網址.tw/my test.asp?name=ståle&car=saab after: http%3A%2F%2Fwww.%E4%B8%AD%E6%96%87%E7%B6%B2%E5%9D%80.tw%2Fmy%20test.asp%3Fname%3Dst%C3%A5le%26car%3Dsaab
URL Decoder/Encoder^[2]	(same as above)	TRUE	(same as above)
PHP: json_encode ↔ PHP: json_decode	Save array in mysql database	TRUE	before: array("作者" => "馬克吐溫", "名言" => "\"To a man with a hammer, everything looks like a nail.\" He said."); after: {"\u4f5c\u8005":"\u99ac\u514b\u5410\u6eab","\u540d\u8a00":"\"To a man with a hammer, everything looks like a nail.\" He said."}
PHP: serialize ↔ PHP: unserialize	Save array in mysql database	FALSE	before: array("作者" => "馬克吐溫", "名言" => "\"To a man with a hammer, everything looks like a nail.\" He said."); after: a:2:{s:6:"作者";s:12:"馬克吐溫";s:6:"名言";s:64:""To a man with a hammer, everything looks like a nail." He said.";}
PHP: htmlentities [1] ↔ PHP: html_entity_decode	Replace reserved characters e.g. double quote symbol	FALSE	before: 馬克吐溫名言 "To a man with a hammer, everything looks like a nail." after: 馬克吐溫名言 "To a man with a hammer, everything looks like a nail."

Other functions

JSON.parse() or jQuery.parseJSON() | jQuery API Documentation

List of the garbled text and possible cause

Feature	Example	Meaning	Restore to human readable ↔ encode text
Website address contains %2 symbols	http%3A%2F%2Fwww.%E4%B8%AD%E6%96%87%E7%B6%B2%E5%9D%80.tw%2F	"converts characters into a format that can be transmitted over the Internet ... " Cited from w3schools	URL decode ↔ URL eocode
Downloaded Json or JavaScript file which its content contains \u or \U symbols	\u8c61 or \U0001f418	(1) 16-bit or 32-bit hex value (2) "JSON representation of the supplied value"^[3]^[4]	JSON decode ↔ JSON eocode
String starting from 0x symbols	0x8c61	hexadecimal string^[5]
String starting from \x symbols	\xe8\xa8\xb1	"\x is a string escape code, which happens to use hex notation" (hexadecimal notation)^[6]	hexadecimal to text ↔ text to hexadecimal

string starting from \u or \U symbol

Using PHP. Type is string

$encoded = json_encode("象");
echo "encoded string: " . $encoded . PHP_EOL; // print "\u8c61"
echo "decoded string: " . json_decode($encoded, true) . PHP_EOL; // print 象

$encoded = json_encode("🐘");
echo "encoded string: " . $encoded . PHP_EOL; // print "\ud83d\udc18"
echo "decoded string: " . json_decode($encoded, true) . PHP_EOL; // print 🐘

Using Python. Type is string

x = u'象'
x.encode('ascii', 'backslashreplace') 
# print b'\\u8c61'

x = u'🐘'
x.encode('ascii', 'backslashreplace') 
# print b'\\U0001f418'

Using PHP. Type is array

$input = <<<EOT

["\u8c61"]

EOT;

$input = trim($input);
var_dump(json_decode($input, true)); // print array("象")
var_dump(json_encode(array("象")); // print ["\u8c61"]

string starting from 0x symbol

Using Python chr() Function ↔ hex() function

int('0x8c61', 16)
# print 35937 -- "An integer representing a valid Unicode code point" cited from w3schools
chr(int('0x8c61', 16))
# print '象' -- "returns the character that represents the specified unicode." cited from w3schools
hex(ord('象'))
# print '0x8c61' -- "converts an integer number to the corresponding hexadecimal string." cited from programiz.com

chr(int('0x1f418', 16))
# print '🐘'
hex(ord('🐘'))
# print '0x1f418'

string starting from \x symbol

Using Python^[7]^[8]^[9]

data = u"象"
data
hex_notation = data.encode('utf-8')
hex_notation
# print b'\xe8\xb1\xa1'
for each_unicode_character in hex_notation.decode('utf-8'):
    print(each_unicode_character)


data = u"🐘"
data
hex_notation = data.encode('utf-8')
hex_notation
# print b'\xf0\x9f\x90\x98'
for each_unicode_character in hex_notation.decode('utf-8'):
    print(each_unicode_character)


data = u"だいじょうぶ"
data
hex_notation = data.encode('utf-8')
hex_notation 
# print b'\xe3\x81\xa0\xe3\x81\x84\xe3\x81\x98\xe3\x82\x87\xe3\x81\x86\xe3\x81\xb6'
for each_unicode_character in hex_notation.decode('utf-8'):
    print(each_unicode_character)

Ways to fix garbled message text

ConvertZ v.8.02

choose encode: manually (mainly in Asia language)
convert to UTF-8: available
convert to big5 from UTF-8: available the wording may be changed by the software ex: 余美人 -> 於美人
allow to wrap long text: available

EmEditor v.14.3.1 ($)

choose encode: manually and auto-detect
convert to UTF-8: available
allow to wrap long text: available
support command line: EmEditor FAQ: How can I convert file encodings with the command line?

Google Chrome v.10 (viewer)

choose encode: manually and auto-detect
allow to wrap long text: available (auto)

MadEdit v.0.2.9.1

choose encode: manually and auto-detect
convert to UTF-8: available
allow to wrap long text: available

Microsoft Internet Explorer v.8 (viewer)

choose encode: manually and auto-detect
allow to wrap long text:

Microsoft notepad (記事本) for Windows

method 1: Err: 解決用記事本(notepad)開啟簡體字txt檔，出現亂碼的問題(2010): notepad + Notepad++

choose encode: manually
convert to UTF-8: available by Notepad++
allow to wrap long text: available

method 2: Microsoft AppLocale 公用程式(patched: piaip pAppLocale) + notepad

choose encode: manually
convert to UTF-8: not available
allow to wrap long text: available

Microsoft Office Word 2003 ($)

choose encode: manually
convert to UTF-8: available
allow to wrap long text: available

Mozilla Firefox v.3.6 (viewer)

choose encode: manually and auto-detect
allow to wrap long text: no but you can copy the following code into the web address bar to wrap long text (Thanks, Return of the Sasquatch: word wrap for Firefox bookmarklet!)

javascript:(function() { var D = document; F(D.body); function F(n) { var u, r, c, x; if (n.nodeType == 3) { u = n.data.search(/\S{45}/); if (u >= 0) { r = n.splitText(u + 45); n.parentNode.insertBefore(D.createElement('wbr'), r); } } else if ((n.tagName != 'STYLE') && (n.tagName != 'SCRIPT')) { for (c = 0; x = n.childNodes[c]; ++c) { F(x); } } } D.body.innerHTML += ' '; })();

Notepad++ v.5.8

choose encode: manually
convert to UTF-8: available
allow to wrap long text: available

not supported at this moment

LibreOffice 3.3.0 - Writer
OpenOffice.org 3.3.0 - Writer is not supported but OpenOffice.org Calc is supported.

References

[1] urlencode - How to Encode URL Contains Unicode Characters with PHP - Stack Overflow

[2] PHP urlencode()

[3] PHP: json_encode - Manual

[4] RFC 7159: The JavaScript Object Notation (JSON) Data Interchange Format

[5] Python hex() - Python Standard Library

[6] Difference between different hex types/representations in Python - Stack Overflow

[7] ytes.decode()

[8] str.encode()

[9] ython - How to decode unicode in a Chinese text - Stack Overflow

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

@@ Line 102: / Line 102: @@
 </tr>
 <tr>
-<td>String contains {{kbd | key=<nowiki>\x</nowiki>}} symbols</td>
+<td>String starting from {{kbd | key=<nowiki>\x</nowiki>}} symbols</td>
-<td style="word-wrap: break-word;">{{kbd | key=<nowiki>b'\xe8\xa8\xb1'</nowiki>}}</td>
+<td style="word-wrap: break-word;">{{kbd | key=<nowiki>\xe8\xa8\xb1</nowiki>}}</td>
 <td>"\x is a string escape code, which happens to use hex notation" (hexadecimal notation)<ref>[https://stackoverflow.com/questions/13123877/difference-between-different-hex-types-representations-in-python Difference between different hex types/representations in Python - Stack Overflow]</ref></td>
 <td>hexadecimal to text ↔ text to hexadecimal</td>

Fix garbled message text: Difference between revisions

Revision as of 08:55, 2 February 2019

Contents

Ideas on how to fix garbled message text

List of the garbled text and possible cause

string starting from \u or \U symbol

string starting from 0x symbol

string starting from \x symbol

Ways to fix garbled message text

ConvertZ v.8.02

EmEditor v.14.3.1 ($)

Google Chrome v.10 (viewer)

MadEdit v.0.2.9.1

Microsoft Internet Explorer v.8 (viewer)

Microsoft notepad (記事本) for Windows

Microsoft Office Word 2003 ($)

Mozilla Firefox v.3.6 (viewer)

Notepad++ v.5.8

not supported at this moment

Further reading

References

Navigation menu

Fix garbled message text: Difference between revisions

Revision as of 08:55, 2 February 2019

Ideas on how to fix garbled message text

List of the garbled text and possible cause

string starting from \u or \U symbol

string starting from 0x symbol

string starting from \x symbol

Ways to fix garbled message text

ConvertZ v.8.02

EmEditor v.14.3.1 ($)

Google Chrome v.10 (viewer)

MadEdit v.0.2.9.1

Microsoft Internet Explorer v.8 (viewer)

Microsoft notepad (記事本) for Windows

Microsoft Office Word 2003 ($)

Mozilla Firefox v.3.6 (viewer)

Notepad++ v.5.8

not supported at this moment

Further reading

References

Navigation menu

Search