Byte order mark

From LemonWiki共筆
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Byte order mark (BOM, 位元組順序記號, 部分編輯器稱為「簽名」)

How to see Byte order mark

MySQL way

Using MySQL HEX() function "returns a string representation of a hexadecimal value of a decimal or string value specified as an argument."


Run sql on sqlfiddle.com or Download the Sql file directly.

CREATE TABLE `articles` (
  `id` varchar(50) NOT NULL,
  `notes` text NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

INSERT INTO `articles` (`id`, `notes`) VALUES
('1234567890', 'no BOM'),
('1234567890', 'BOM');

ALTER TABLE `articles`
  ADD UNIQUE KEY `id` (`id`) USING BTREE;

SELECT HEX(`id`), `id`, `notes` FROM `articles`;
HEX(id) id notes
31323334353637383930 1234567890 UTF-8 without BOM
EFBBBF31323334353637383930 1234567890 UTF-8 with BOM


If the column `id` was only allowed integer in column value, you can use the following sql query to find the records contains BOM:

SELECT * 
FROM `articles`
WHERE HEX(`id`) REGEXP '[^0-9]+'

PHP way

PHP code[1][2][3]:


$string = "1234567890";
echo $string . " NOT contains BOM --> after str2hex: " . str2hex($string) . PHP_EOL;

$string = "\xEF\xBB\xBF" . "1234567890";
echo $string . " contains BOM  --> after str2hex: " . str2hex($string) . PHP_EOL;

function str2hex($string) {
	$hexstr = unpack('H*', $string);
	return array_shift($hexstr);
}

Result:

1234567890 NOT contains BOM --> after str2hex: 31323334353637383930
1234567890 contains BOM  --> after str2hex: efbbbf31323334353637383930

Excel / Google sheet way

Using the CODE function to check the "numeric code for the first character in a text string". If the cell A1 contains BOM,

  • =CODE(A1) returns 63 on Excel 2016 of Win Os windows.png [4]
  • =CODE(A1) returns 95 on Excel 2016 of Mac icon_os_mac.png
  • =CODE(A1) returns 65279 or other numeric value e.g. 28201 on Google sheet

BASH command

Check if a UTF-8 encoded file contains a BOM. The first line result of hexdump mentioning ef bb bf indicates it contains a BOM

% hexdump -n 3 -C filename

00000000  ef bb bf                                          |...|
00000003

File command

Using file (command): file filename.txt on Linux Os linux.png , Mac icon_os_mac.png [5] & Cygwin on Win Os windows.png . See details on Text file encoding

Hex editor

Using Hext editor to open the text file. 位元組順序記號 - 維基百科,自由的百科全書

How to remove Byte order mark

PHP

// 原始資料:程式碼編輯器會顯示為 ZWNBSP,但是一般編輯器試看不到
$text = "\xef\xbb\xbf" . "單位名稱";

// 移除 BOM
$text = preg_replace('/[\x{200B}-\x{200D}\x{FEFF}]/u', '', $text);

References