Byte order mark: Difference between revisions

Jump to navigation Jump to search
1,107 bytes added ,  17 September 2023
 
(7 intermediate revisions by the same user not shown)
Line 33: Line 33:
<td>31323334353637383930</td>
<td>31323334353637383930</td>
<td>1234567890</td>
<td>1234567890</td>
<td>no BOM</td>
<td>UTF-8 without BOM</td>
</tr>
</tr>
<tr>
<tr>
Line 40: Line 40:
<td>EFBBBF31323334353637383930</td>
<td>EFBBBF31323334353637383930</td>
<td>1234567890</td>
<td>1234567890</td>
<td>BOM</td>
<td>UTF-8 with BOM</td>
</tr>
</tr>
<tr>
<tr>
Line 55: Line 55:


=== PHP way ===
=== PHP way ===
PHP code<ref>[https://stackoverflow.com/questions/14674834/php-convert-string-to-hex-and-hex-to-string PHP convert string to hex and hex to string - Stack Overflow]</ref><ref>[https://www.w3schools.com/php/func_misc_unpack.asp PHP unpack() Function]</ref>:
PHP code<ref>[https://stackoverflow.com/questions/14674834/php-convert-string-to-hex-and-hex-to-string PHP convert string to hex and hex to string - Stack Overflow]</ref><ref>[https://www.w3schools.com/php/func_misc_unpack.asp PHP unpack() Function]</ref><ref>[https://stackoverflow.com/questions/10290849/how-to-remove-multiple-utf-8-bom-sequences-before-doctype php - How to remove multiple UTF-8 BOM sequences before "<!DOCTYPE>"? - Stack Overflow]</ref>:
<pre>
<pre>


Line 83: Line 83:
* {{kbd | key=<nowiki>=CODE(A1)</nowiki>}} returns {{kbd | key=65279}} or other numeric value e.g. {{kbd | key=28201}} on Google sheet
* {{kbd | key=<nowiki>=CODE(A1)</nowiki>}} returns {{kbd | key=65279}} or other numeric value e.g. {{kbd | key=28201}} on Google sheet


=== File command way ===
=== BASH command ===
Using [https://en.wikipedia.org/wiki/File_(command) file (command)]: {{kbd | key=<nowiki>file filename.txt</nowiki>}} on {{Linux}}, {{Mac}}<ref>[https://developer.apple.com/legacy/library/documentation/Darwin/Reference/ManPages/man1/file.1.html file(1) Mac OS X Manual Page]</ref> & Cygwin on {{Win}}
Check if a UTF-8 encoded file contains a BOM. The first line result of {{kbd | key=hexdump}} mentioning ef bb bf indicates it contains a BOM


<pre>
% hexdump -n 3 -C filename
00000000  ef bb bf                                          |...|
00000003
</pre>


<table border="1" class="wikitable" >
=== File command ===
<tr>
Using [https://en.wikipedia.org/wiki/File_(command) file (command)]: {{kbd | key=<nowiki>file filename.txt</nowiki>}} on {{Linux}}, {{Mac}}<ref>[https://developer.apple.com/legacy/library/documentation/Darwin/Reference/ManPages/man1/file.1.html file(1) Mac OS X Manual Page]</ref> & Cygwin on {{Win}}. See details on [[Text file encoding]]
<th>File content</th>
 
<th>Example result returned by file command</th>
=== Hex editor ===
</tr>
Using Hext editor to open the text file. [https://zh.wikipedia.org/wiki/%E4%BD%8D%E5%85%83%E7%B5%84%E9%A0%86%E5%BA%8F%E8%A8%98%E8%99%9F 位元組順序記號 - 維基百科,自由的百科全書]


<tr>
== How to remove Byte order mark ==
<td>File contains BOM</td>
PHP
<td>UTF-8 Unicode (with BOM) text</td>
* [https://stackoverflow.com/questions/10290849/how-to-remove-multiple-utf-8-bom-sequences php - How to remove multiple UTF-8 BOM sequences - Stack Overflow]
</tr>
* [https://stackoverflow.com/questions/22600235/remove-or-match-a-unicode-zero-width-space-php replace - Remove or match a Unicode Zero Width Space PHP - Stack Overflow]
<tr>


<td>File NOT contains BOM</td>
<pre>
<td>UTF-8 Unicode text</td>
// 原始資料:程式碼編輯器會顯示為 ZWNBSP,但是一般編輯器試看不到
</tr>
$text = "\xef\xbb\xbf" . "單位名稱";


</table>
// 移除 BOM
$text = preg_replace('/[\x{200B}-\x{200D}\x{FEFF}]/u', '', $text);
</pre>


== References ==
== References ==
Line 109: Line 116:
<references />
<references />


[[Category:Programming]] [[Category:Data Science]]
[[Category:Programming]] [[Category:Data Science]] [[Category:String manipulation]]
Anonymous user

Navigation menu