Byte order mark: Difference between revisions

From LemonWiki共筆
Jump to navigation Jump to search
Line 84: Line 84:


=== File command way ===
=== File command way ===
Using [https://en.wikipedia.org/wiki/File_(command) file (command)]: {{kbd | key=<nowiki>file filename.txt</nowiki>}} on {{Linux}}, {{Mac}}<ref>[https://developer.apple.com/legacy/library/documentation/Darwin/Reference/ManPages/man1/file.1.html file(1) Mac OS X Manual Page]</ref> & Cygwin on {{Win}}
Using [https://en.wikipedia.org/wiki/File_(command) file (command)]: {{kbd | key=<nowiki>file filename.txt</nowiki>}} on {{Linux}}, {{Mac}}<ref>[https://developer.apple.com/legacy/library/documentation/Darwin/Reference/ManPages/man1/file.1.html file(1) Mac OS X Manual Page]</ref> & Cygwin on {{Win}}. See details on [[Text file encoding]]
 
 
<table border="1" class="wikitable" >
<tr>
<th>File content</th>
<th>Example result returned by file command</th>
</tr>
 
<tr>
<td>File contains BOM</td>
<td>UTF-8 Unicode (with BOM) text</td>
</tr>
<tr>
 
<td>File NOT contains BOM</td>
<td>UTF-8 Unicode text</td>
</tr>
 
</table>


== References ==
== References ==

Revision as of 15:55, 21 October 2018

Byte order mark (BOM, 位元組順序記號, 部分編輯器稱為「簽名」)

How to see Byte order mark

MySQL way

Using MySQL HEX() function "returns a string representation of a hexadecimal value of a decimal or string value specified as an argument."


Run sql on sqlfiddle.com or Download the Sql file directly.

CREATE TABLE `articles` (
  `id` varchar(50) NOT NULL,
  `notes` text NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

INSERT INTO `articles` (`id`, `notes`) VALUES
('1234567890', 'no BOM'),
('1234567890', 'BOM');

ALTER TABLE `articles`
  ADD UNIQUE KEY `id` (`id`) USING BTREE;

SELECT HEX(`id`), `id`, `notes` FROM `articles`;
HEX(id) id notes
31323334353637383930 1234567890 no BOM
EFBBBF31323334353637383930 1234567890 BOM


If the column `id` was only allowed integer in column value, you can use the following sql query to find the records contains BOM:

SELECT * 
FROM `articles`
WHERE HEX(`id`) REGEXP '[^0-9]+'

PHP way

PHP code[1][2]:


$string = "1234567890";
echo $string . " NOT contains BOM --> after str2hex: " . str2hex($string) . PHP_EOL;

$string = "\xEF\xBB\xBF" . "1234567890";
echo $string . " contains BOM  --> after str2hex: " . str2hex($string) . PHP_EOL;

function str2hex($string) {
	$hexstr = unpack('H*', $string);
	return array_shift($hexstr);
}

Result:

1234567890 NOT contains BOM --> after str2hex: 31323334353637383930
1234567890 contains BOM  --> after str2hex: efbbbf31323334353637383930

Excel / Google sheet way

Using the CODE function to check the "numeric code for the first character in a text string". If the cell A1 contains BOM,

  • =CODE(A1) returns 63 on Excel 2016 of Win Os windows.png [3]
  • =CODE(A1) returns 95 on Excel 2016 of macOS icon_os_mac.png
  • =CODE(A1) returns 65279 or other numeric value e.g. 28201 on Google sheet

File command way

Using file (command): file filename.txt on Linux Os linux.png , macOS icon_os_mac.png [4] & Cygwin on Win Os windows.png . See details on Text file encoding

References