Byte order mark
Byte order mark (BOM, 位元組順序記號, 部分編輯器稱為「簽名」)
Contents
How to see Byte order mark[edit]
MySQL way[edit]
Using MySQL HEX() function "returns a string representation of a hexadecimal value of a decimal or string value specified as an argument."
Run sql on sqlfiddle.com or Download the Sql file directly.
CREATE TABLE `articles` ( `id` varchar(50) NOT NULL, `notes` text NOT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8; INSERT INTO `articles` (`id`, `notes`) VALUES ('1234567890', 'no BOM'), ('1234567890', 'BOM'); ALTER TABLE `articles` ADD UNIQUE KEY `id` (`id`) USING BTREE; SELECT HEX(`id`), `id`, `notes` FROM `articles`;
HEX(id) | id | notes |
---|---|---|
31323334353637383930 | 1234567890 | UTF-8 without BOM |
EFBBBF31323334353637383930 | 1234567890 | UTF-8 with BOM |
If the column `id` was only allowed integer in column value, you can use the following sql query to find the records contains BOM:
SELECT * FROM `articles` WHERE HEX(`id`) REGEXP '[^0-9]+'
PHP way[edit]
$string = "1234567890"; echo $string . " NOT contains BOM --> after str2hex: " . str2hex($string) . PHP_EOL; $string = "\xEF\xBB\xBF" . "1234567890"; echo $string . " contains BOM --> after str2hex: " . str2hex($string) . PHP_EOL; function str2hex($string) { $hexstr = unpack('H*', $string); return array_shift($hexstr); }
Result:
1234567890 NOT contains BOM --> after str2hex: 31323334353637383930 1234567890 contains BOM --> after str2hex: efbbbf31323334353637383930
Excel / Google sheet way[edit]
Using the CODE function to check the "numeric code for the first character in a text string". If the cell A1 contains BOM,
- =CODE(A1) returns 63 on Excel 2016 of Win
[4]
- =CODE(A1) returns 95 on Excel 2016 of Mac
- =CODE(A1) returns 65279 or other numeric value e.g. 28201 on Google sheet
BASH command[edit]
Check if a UTF-8 encoded file contains a BOM. The first line result of hexdump mentioning ef bb bf indicates it contains a BOM
% hexdump -n 3 -C filename 00000000 ef bb bf |...| 00000003
File command[edit]
Using file (command): file filename.txt on Linux , Mac
[5] & Cygwin on Win
. See details on Text file encoding
Hex editor[edit]
Using Hext editor to open the text file. 位元組順序記號 - 維基百科,自由的百科全書
How to remove Byte order mark[edit]
PHP
- php - How to remove multiple UTF-8 BOM sequences - Stack Overflow
- replace - Remove or match a Unicode Zero Width Space PHP - Stack Overflow
// 原始資料:程式碼編輯器會顯示為 ZWNBSP,但是一般編輯器試看不到 $text = "\xef\xbb\xbf" . "單位名稱"; // 移除 BOM $text = preg_replace('/[\x{200B}-\x{200D}\x{FEFF}]/u', '', $text);