Difference between revisions of "Data cleaning"

From LemonWiki共筆
Jump to navigation Jump to search
(counting)
 
(11 intermediate revisions by the same user not shown)
Line 290: Line 290:
 
</pre>
 
</pre>
  
=== Time data ===
+
=== Time data: Verify the data format ===
 +
Verify the value by using MySQL functions:
  
==== Condition: Data was generated in 10 years ====
+
Verify the value should be year-month-day format e.g. {{Template:Today}}
 +
<pre>
 +
SELECT `my_date_column`, UNIX_TIMESTAMP(STR_TO_DATE(`my_date_column`, '%Y-%m-%d'))
 +
FROM `my_table`
 +
WHERE
 +
UNIX_TIMESTAMP(STR_TO_DATE(`my_date_column`, '%Y-%m-%d')) IS NULL;
 +
 
 +
</pre>
 +
 
 +
Verify the value should be year/month/day format
 +
<pre>
 +
SELECT `my_date_column`, UNIX_TIMESTAMP(STR_TO_DATE(`my_date_column`, '%Y/%m/%d'))
 +
FROM `my_table`
 +
WHERE
 +
UNIX_TIMESTAMP(STR_TO_DATE(`my_date_column`, '%Y/%m/%d')) IS NULL;
 +
 
 +
</pre>
 +
 
 +
Verify the value should be hour:minute:second format e.g. {{CURRENTTIME}}:06
 +
<pre>
 +
SELECT `my_time_column`, UNIX_TIMESTAMP(STR_TO_DATE(`my_time_column`, '%H:%i:%S'))
 +
FROM `my_table`
 +
WHERE
 +
UNIX_TIMESTAMP(STR_TO_DATE(`my_time_column`, '%H:%i:%S')) IS NULL;
 +
 
 +
</pre>
 +
 
 +
Verify the value should be year-month-day hour:minute:second format e.g. {{Template:Today}} {{CURRENTTIME}}:06
 +
<pre>
 +
SELECT `my_time_column`, UNIX_TIMESTAMP(STR_TO_DATE(`my_time_column`, '%Y-%m-%d %H:%i:%S'))
 +
FROM `my_table`
 +
WHERE
 +
UNIX_TIMESTAMP(STR_TO_DATE(`my_time_column`, '%Y-%m-%d %H:%i:%S')) IS NULL;
 +
 
 +
</pre>
 +
 
 +
Alternative PHP solution: [https://www.php.net/manual/en/function.strtotime.php strtotime]
 +
 
 +
=== Time data: Data was generated in 10 years ===
 
Definition of abnormal values of the time data ([http://en.wikipedia.org/wiki/Time_series time series]) if they
 
Definition of abnormal values of the time data ([http://en.wikipedia.org/wiki/Time_series time series]) if they
 
* were generated 10 years before or
 
* were generated 10 years before or
Line 317: Line 356:
 
* PHP: [http://stackoverflow.com/questions/19271381/correctly-determine-if-date-string-is-a-valid-date-in-that-format php - Correctly determine if date string is a valid date in that format - Stack Overflow]
 
* PHP: [http://stackoverflow.com/questions/19271381/correctly-determine-if-date-string-is-a-valid-date-in-that-format php - Correctly determine if date string is a valid date in that format - Stack Overflow]
  
==== Condition: Human birth year (age) data ====
+
=== Time data: Human birth year (age) data ===
 
Based on the existing record, the longest-living person who lived to 122<ref>[https://en.wikipedia.org/wiki/Maximum_life_span Maximum life span - Wikipedia]</ref>.  
 
Based on the existing record, the longest-living person who lived to 122<ref>[https://en.wikipedia.org/wiki/Maximum_life_span Maximum life span - Wikipedia]</ref>.  
  
Line 445: Line 484:
 
== Counting ==
 
== Counting ==
  
=== Counting number of occurrences (or frequency) of string ===
+
* [[Count occurrences of a word in string]]
Cygwin
+
* Count number of unique values
* (1) separate each string by [[Return symbol | return_symbol]] (2) [https://www.computerhope.com/unix/uuniq.htm uniq command] on Cygwin of {{Win}} or {{Linux}}: {{kbd | key=<nowiki>sort <file.txt> | uniq -c</nowiki>}}<ref>[https://unix.stackexchange.com/questions/134446/counting-the-occurrences-of-the-string text processing - Counting the occurrences of the string - Unix & Linux Stack Exchange]</ref>
+
** Excel: [https://www.excel-easy.com/examples/count-unique-values.html Count Unique Values in Excel - Easy Excel Tutorial]
 +
** Google sheet: [https://support.google.com/docs/answer/3093405?hl=zh-Hant COUNTUNIQUE - 文件編輯器說明] & [https://infoinspired.com/google-docs/spreadsheet/unique-function-in-horizontal-data-range-in-google-sheets/ How to Use UNIQUE Function in Horizontal Data Range in Google Sheets]
  
 
== Outlier / Anomaly detection ==
 
== Outlier / Anomaly detection ==
Line 492: Line 532:
 
# ASCII Backspace {{kbd | key=<nowiki>\b</nowiki>}}
 
# ASCII Backspace {{kbd | key=<nowiki>\b</nowiki>}}
 
# [https://en.wikipedia.org/wiki/Non-breaking_space Non-breaking space] ({{kbd | key=<nowiki>nbsp;</nowiki>}}) Replace Non-breaking space with one whitespace using PHP: {{kbd | key=<nowiki>$result = str_replace("\xc2\xa0", ' ', $original_string);</nowiki>}}<ref>[https://stackoverflow.com/questions/40724543/how-to-replace-decoded-non-breakable-space-nbsp php - How to replace decoded Non-breakable space (nbsp) - Stack Overflow]</ref>
 
# [https://en.wikipedia.org/wiki/Non-breaking_space Non-breaking space] ({{kbd | key=<nowiki>nbsp;</nowiki>}}) Replace Non-breaking space with one whitespace using PHP: {{kbd | key=<nowiki>$result = str_replace("\xc2\xa0", ' ', $original_string);</nowiki>}}<ref>[https://stackoverflow.com/questions/40724543/how-to-replace-decoded-non-breakable-space-nbsp php - How to replace decoded Non-breakable space (nbsp) - Stack Overflow]</ref>
 +
 +
Sentence spacing
 +
# [https://en.wikipedia.org/wiki/Sentence_spacing_in_digital_media Sentence spacing in digital media - Wikipedia] e.g. {{kbd | key=<nowiki>&Nbsp; &Ensp; &Emsp;</nowiki>}}
  
 
How to display the Non-breaking space In PHP?
 
How to display the Non-breaking space In PHP?

Latest revision as of 13:26, 26 July 2020

check if field value was not fulfilled[edit]

By purpose[edit]

Purpose Method (MySQL query syntax) Value1:
Fulfilled value what I want
Value2:
Fulfilled value NOT I want
Value3:
0
Value4:
NULL value
Value5:
Empty or white-spaces characters
values were not fulfilled or empty
(not contains 0)
WHERE column_name IS NULL
OR LENGTH(TRIM( column_name )) = 0
V V
values were not fulfilled or empty
(contains 0)
V V V
values were fulfilled and non-empty
(not contains 0)
V V
values were fulfilled and non-empty
(contains 0)
WHERE LENGTH(TRIM( column_name )) > 0 V V V
values (1) were not fulfilled or empty values (2) NOT I want
(not contains 0)
WHERE column_name IS NULL
OR LENGTH(TRIM( column_name )) = 0
OR column_name LIKE 'values NOT I want'
V V V
values (1) were not fulfilled or empty values (2) NOT I want
(contains 0)
V V V V

by datatype[edit]

VARCHAR and allows NULL value[edit]

data type of column possible column values method1:
find not fulfilled or empty values
method2:
find fulfilled and non-empty values
method3:
find NULL values
method4:
find not NULL values
VARCHAR and allows NULL fulfilled value ex:123 V V
NULL type:null V V
'NULL' type:string
0 V V
EMPTY ex: '' or space(s) ' ' V V

symbol V: means the column value will be able to find by means of the method


  • method1:
    • SELECT * FROM `my_table` WHERE COALESCE(column_name, '') = ''[1]
    • SELECT * FROM `my_table` WHERE column_name IS NULL OR LENGTH(TRIM( column_name )) = 0
    • SELECT * FROM `my_table` WHERE column_name IS NULL OR column_name = ''[2]
  • method2:
    • SELECT * FROM `my_table` WHERE column_name > ''
    • SELECT * FROM `my_table` WHERE LENGTH(TRIM( column_name )) > 0
    • SELECT * FROM `my_table` WHERE LENGTH(TRIM( column_name )) != 0
  • method3: SELECT * FROM `my_table` WHERE column_name IS NULL
  • method4: SELECT * FROM `my_table` WHERE column_name IS NOT NULL

VARCHAR or numeric[edit]

data type of column possible column values method5:
find values within the range
method6:
find values out of the range, empty & NULL values
VARCHAR or numeric values within the range ex: min ≤ value ≤ max V
values out of range V
NULL V
EMPTY ex: '' or space ' ' V
  • method5: SELECT * FROM `my_table` WHERE column_name BETWEEN min AND max
  • method6: SELECT * FROM `my_table` WHERE ( (COALESCE(column_name, '') = '') OR (column_name NOT BETWEEN min AND max) )

datetime and allows NULL value[edit]

possible column values

  1. 2020-08-04
  2. 00-00-00 00:00:00
  3. NULL

is null[edit]

Fill 0 if the value is NA or NULL

  • MySQL SQL syntax: COALESCE(): SELECT COALESCE(column_name, 0) or SELECT COALESCE(column_name, 'other_filled_value')
    • (1) Using COALESCE() function to replace the NULL value with 0.
    • (2) The case: 0/0 = null should be handled.
  • MySQL SQL syntax: combined IF() & ISNULL():SELECT IF(ISNULL((column_name), 0, column_name) or SELECT IF(ISNULL((column_name), 'other_filled_value', column_name)
  • python: pandas.DataFrame.fillna — pandas 0.16.0 documentation "Fill NA/NaN values using the specified method"

Find whether a variable is NULL. online demo

  • PHP is_null to find type:null null NOT type:string 'null' Icon exclaim.gif
  • Google spreadsheet / Excel:
    • ISERR(value) " value - The value to be verified as an error type other than #N/A." ex: #NULL!
    • If the cell value is exactly NULL not #NULL!, You may use COUNTIF(value, "NULL") or EXACT(value, "NULL")
  • MySQL SQL syntax: SELECT * FROM table WHERE column IS NULL;[3]
  • R: is.null(): R: The Null Object
  • Excel[4]

Find whether a variable is NOT NULL

  • MySQL SQL syntax: SELECT * FROM table WHERE column IS NOT NULL;

Find whether a variable is NOT #N/A

  • Excel: =NOT(ISERROR(cell_value))

javascript

check if field value was not fulfilled: NULL, empty value[edit]

Icon exclaim.gif NOT include those data which its field value fulfilled with default value automatically (demo on sqlfiddle)

  1. Good! quick solution: find records with NULL value OR empty, space value
    • MySQL solution: SELECT * FROM table_name WHERE column_name IS NULL OR LENGTH(TRIM( column_name )) = 0;
  2. find records with NULL value: (note: not #NULL!)
    • MySQL solution: SELECT * FROM table_name WHERE column_name IS NULL;
    • EXCEL: =EXACT(A2, "NULL")
  3. find records with empty value: (not contains NULL value)
    • MySQL: SELECT * FROM table_name WHERE LENGTH(TRIM( column_name )) = 0; Icon exclaim.gif SQL query SELECT * FROM table_name WHERE column_name IS NOT NULL includes empty value
    • MS SQL Server: SELECT * FROM table_name WHERE LEN( LTRIM(RTRIM(column_name)) ) = 0; [5]
  4. Excel starting date: 1900/1/0 (converted time formatted value from 0), 1900/1/1 (converted time formatted value from 1), 1900/1/2 ...
    • solution: step1: Replace the year > 100 from this year with empty value at EXCEL: =IF(ISERR(YEAR(A2)), "", IF(YEAR(A2)<1914, "", A2)) (this formula also handle empty value and non well-formatted column value ex: 0000-12-31 ) ; step2: change the format of cell to time format
    • trivial approach : EXCEL: =IF(ISERR(YEAR(A2)), "", IF(YEAR(A2)-YEAR(NOW())>100, "", A2)) Icon exclaim.gif this formula could not handle empty value because it return 0. If I change the format of cell to time format, 0 will become 1900/1/0.
  5. Using PHP
    • empty() function to find 0, null, false, empty string, empty array values.
    • if(empty($var) && $var !== 0 && $var !== "0"){ .. } to find null, false, empty string, empty array values BUT not 0.
  6. check if field value was NULL & not equal to some value

check if field value was fulfilled[edit]

length of string > 0

  • MySQL: SELECT * FROM table_name WHERE LENGTH(TRIM( column_name )) != 0; demo[1]

column value is not null or 0

  • Excel: COUNTIFS(criteria_range1, "<>NULL", criteria_range1, "<>0")[6]

find if number or cell value is positive integer

  • EXCEL: =IFERROR(IF(AND(INT( value )= value, value>0), TRUE, FALSE), FALSE)[7] online demo

check numeric range

  • MySQL: SELECT * FROM table_name WHERE column_name BETWEEN min_number AND max_number; the value >= min_number AND value <= max_number ( min_number ≤ value ≤ max_number )

find NOT empty records means records without NULL or empty value:

  • MySQL: SELECT * FROM table_name WHERE LENGTH(TRIM( column_name )) != 0;
  • MySQL: SELECT * FROM table_name WHERE column_name != '' AND column_name IS NOT NULL;

Verify the format of field value[edit]

related page: Regular expression

Email contains @ symbol[edit]

  • EXCEL: =IF(ISERR(FIND("@", A2, 1)), FALSE, TRUE) only check the field if contains @ symbol or not
    • result: (1) normal condition: return TRUE; (2) exceptional condition: return FALSE if @ symbol was not found
  • EXCEL: =FIND("@", A2, 2) only check the field if contains @ symbol or not
    • syntax: FIND(find_text, with_text, [start_num]) the start_num is 2 because the position of @ symbol should be larger than 1 (position of first char is 1)
    • result: (1) normal condition: return the number larger than 1; (2) exceptional condition: return #VALUE! if @ symbol was not found
  • PHP: PHP FILTER_VALIDATE_EMAIL Filter
    • "Returns the filtered data, or FALSE if the filter fails." quoted from PHP.net

Number precision in Excel[edit]

Number precision: 15 digits (Excel中最多的有效位數為15位)[8][9]

raw data: 1234567890123456 ->

  • (numeric format 數值格式) 1234567890123450.00 Icon exclaim.gif losing precision
  • (general format 通用格式) 1.23457E+15 Icon exclaim.gif losing precision
  • (text format 文字格式) 1234567890123456

large numbers

  • If the data was imported from Excel, you should notice the 15 digit precision issue.

Numeric[edit]

List of the possible abnormal values:

  • All numeric values are odd or even if the data were generated by user naturally.

PHP:

MySQL:

  • Find the records which the value of `my_column` is numeric values entirely SELECT * FROM `my_table` WHERE `my_column` REGEXP '^[0-9]+$'[10]
  • Find the records which the value of `my_column` is NOT numeric values entirely SELECT * FROM `my_table` WHERE `my_column` NOT REGEXP '^[0-9]+$'

If the digit of number is known, the SQL syntax could be more specific

  • The tax_id column is 8 digits only. Find the well-formatted tax_id records by using SELECT * FROM `tax_id` WHERE `tax_id` REGEXP '^[0-9]{8}$'

Excel & Google Sheets:

  • Using ISNUMBER Function: =INT(ISNUMBER(A1))
    • Return 1 if the cell value is (1) Numbers (2) Numbers in scientific (exponential) notation e.g. 1.36184E+14 (3) Decimal numbers e.g. 3.141592654 (4) Negative numbers
    • Return 0 if the cell value is (1) Text (2) Numbers that are stored as text e.g. ="5"
  • Google Sheets only: Using REGEXMATCH, TRIM & CONCAT[11] functions: =IF(REGEXMATCH(CONCAT("", TRIM(A1)), "^\d+$"), 1, 0)
    • Return 1 if the cell value is (1) Numbers (2) Numbers that are stored as text e.g. ="5"
    • Return 0 if the cell value is (1) Text (2) Numbers in scientific (exponential) notation e.g. 1.23E+16 (3) Decimal numbers e.g. 3.141592654 (4) Negative numbers
test data:
3.141592654
1.36184E+14
20740199601
346183773390240
="5"

Time data: Verify the data format[edit]

Verify the value by using MySQL functions:

Verify the value should be year-month-day format e.g. 2020-08-04

SELECT `my_date_column`, UNIX_TIMESTAMP(STR_TO_DATE(`my_date_column`, '%Y-%m-%d')) 
FROM `my_table`
WHERE
UNIX_TIMESTAMP(STR_TO_DATE(`my_date_column`, '%Y-%m-%d')) IS NULL;

Verify the value should be year/month/day format

SELECT `my_date_column`, UNIX_TIMESTAMP(STR_TO_DATE(`my_date_column`, '%Y/%m/%d')) 
FROM `my_table`
WHERE
UNIX_TIMESTAMP(STR_TO_DATE(`my_date_column`, '%Y/%m/%d')) IS NULL;

Verify the value should be hour:minute:second format e.g. 17:17:06

SELECT `my_time_column`, UNIX_TIMESTAMP(STR_TO_DATE(`my_time_column`, '%H:%i:%S')) 
FROM `my_table`
WHERE
UNIX_TIMESTAMP(STR_TO_DATE(`my_time_column`, '%H:%i:%S')) IS NULL;

Verify the value should be year-month-day hour:minute:second format e.g. 2020-08-04 17:17:06

SELECT `my_time_column`, UNIX_TIMESTAMP(STR_TO_DATE(`my_time_column`, '%Y-%m-%d %H:%i:%S')) 
FROM `my_table`
WHERE
UNIX_TIMESTAMP(STR_TO_DATE(`my_time_column`, '%Y-%m-%d %H:%i:%S')) IS NULL;

Alternative PHP solution: strtotime

Time data: Data was generated in 10 years[edit]

Definition of abnormal values of the time data (time series) if they

  • were generated 10 years before or
  • newer than today

List of the possible abnormal values:

  • 0001-01 00:00:00 occurred in MySQL datetime type
  • 1900/1/0 (converted time formatted value from 0), 1900/1/1 (converted time formatted value from 1), 1900/1/2 ... occurred in MS Excel
  • future data: the date after today

Find the normal values:

  • MySQL: Assume the data was generated in recent 10 years & not newer than today
    • SELECT * FROM `my_table` WHERE ( `my_time_column` >= CURDATE() - INTERVAL 10 YEAR ) AND ( `my_time_column` < CURDATE() + 1);
      • Icon exclaim.gif NOT `my_time_column` < CURDATE()。 ex: CURDATE() is 2020-08-04. Which is the same with 2020-08-04 00:00:00
    • SELECT * FROM `my_table` WHERE ( YEAR( CURDATE() ) - YEAR( `my_time_column`) <= 10 ) AND ( `my_time_column` < CURDATE() + 1);
  • MySQL: Assume the data was generated in recent 10 years & not newer than current timestamp. More precision to second compared with the above approach.
    • SELECT * FROM `my_table` WHERE ( `my_time_column` >= CURDATE() - INTERVAL 10 YEAR ) AND ( `my_time_column` <= CURRENT_TIMESTAMP);
      • You need to check the SELECT CURRENT_TIMESTAMP); if correct or not before you delete the abnormal data (timezone issue)

Check if the date valid

Time data: Human birth year (age) data[edit]

Based on the existing record, the longest-living person who lived to 122[13].

MySQL query is as follows[14] where the column `birthday` is date type.

WHERE TIMESTAMPDIFF(YEAR, `birthday`, CURDATE()) <= 122

Using UNIX_TIMESTAMP() function to check the abnormality of birthday data is not appropriate. Because the birthdays which are earlier 1970-01-01 00:00:00 UTC will all become zero.


String contains special characters[edit]

Duplicate data[edit]

Find duplicate data[edit]

EXCEL[edit]

Finding duplicate rows that differ in one column[edit]
Finding duplicate rows that differ in multiple columns[edit]

Cygwin[edit]

  • uniq command on Cygwin of Win Os windows.png or Linux Os linux.png : uniq -d <file.txt> > <duplicated_items.txt>[15]

MySQL[edit]

Finding duplicate rows that differ in one column[edit]

Find the duplicated data for one column[16]

-- Generate test data.
CREATE TABLE `table_name` (
  `id` int(11) NOT NULL,
  `content` varchar(5) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

INSERT INTO `table_name` (`id`, `content`) VALUES
(1, 'apple'),
(2, 'lemon'),
(3, 'apple');

ALTER TABLE `table_name`
  ADD PRIMARY KEY (`id`);

-- Find duplicated data
SELECT `content`, COUNT(*) count 
FROM `table_name` 
GROUP BY `content` 
HAVING count > 1;

SELECT tmp.* FROM 
( 
  SELECT `content`, count(*) count FROM `table_name` GROUP BY `content` 
) tmp 
WHERE tmp.count >1;
Finding duplicate rows that differ in multiple columns[edit]

Using CONCAT for multiple columns ex: column_1, column_2

SELECT count(*) count, CONCAT(  `column_1`, `column_2`  ) 'key'
	FROM `table_name`
	GROUP BY CONCAT(  `column_1`, `column_2`  )
HAVING count > 1;

or

SELECT tmp.key FROM
(
	SELECT count(*) count, CONCAT(  `column_1`, `column_2`  ) 'key'
	FROM `table_name`
	GROUP BY CONCAT(  `column_1`, `column_2`  )
) tmp
WHERE tmp.count >=2
other cases[edit]

For counting purpose: find the count of repeated id (type: int) between table_a and table_b

SELECT count(DISTINCT(id)) FROM table_a WHERE id IN
(
   SELECT DISTINCT(id) FROM table_b
)

Google Spreadsheet[edit]

Deduplicate[edit]

  • GNU Coreutils: sort invocation OS: Linux Os linux.png , cygwin of Win Os windows.png . More details on Merge multiple plain text files.
    • To remove duplicate lines:
      • sort -us -o <output_unique.file> <input.file> in a large text file (GB)[18]
      • cat <input.file> | grep <pattern> | sort | uniq Processes text line by line and prints the unique lines which match a specified pattern. Equal to these steps: (1) cat <input.file> | grep <pattern> > <tmp.file> (2) sort <tmp.file> | uniq
    • Ignore first n line(s) & remove duplicate lines[19][20][21]
      • (1) ignore first one line: (head -n 1 <file> && tail -n +2 <file> | sort -us) > newfile
      • (2) ignore first two lines: (head -n 2 <file> && tail -n +3 <file> | sort -us) > newfile

Counting number of duplicate occurrence[edit]

MySQL: find the number of duplicate occurrence between list_a & list_b which using the same primary key: column name id

  • SELECT count(DISTINCT(`id`)) FROM `list_a` WHERE `id` IN (SELECT DISTINCT(`id`) FROM `list_b`) ;

Excel:

Other[edit]

  • symbol e.g. data-mining or data_mining

Counting[edit]

Outlier / Anomaly detection[edit]

Anomaly detection of numeric data

  • Median
  • Range Checks
  • All values is event
  • The values are the same even the column is totally different

Anomaly detection of categorical data (qualitative variable)

  • Normal distribution e.g. The interest of audiences should be very different NOT coherent

Anomaly detection for time series data

  • Trend
  • Dramatically Increase or decrease of rows count for each time period

More on: Outlier - Wikipedia

unique number of data values[edit]

If the data values were generated by different users, the unique number of data values should be larger than ____

Data handling[edit]

Remove first, last or certain characters from text[edit]

  • Excel: using RIGHT[22] + LEN[23] functions [24]
  • Excel: if the length of text was fixed after removed, you may try to use REPLACE[25] + LEN functions (demo)

Remove leading and trailing spaces from text[edit]

UPDATE `table` 
SET `column` = TRIM( `column` ) 
WHERE LENGTH(TRIM( `column` )) != LENGTH( `column` );

Remove other string look like whitespace[edit]

Whitespace character

  1. IDEOGRAPHIC SPACE (全形空白、全型空白, U+3000)[26]:
    • diaplay:
      <?php $string = "111" . json_decode('"\u3000"') . "222"; echo $string;?>
    • replace with space:
      <?php echo str_replace(json_decode('"\u3000"'), " ", $string);?>
  2. ASCII Vertical Tab \v
  3. ASCII Horizontal Tab (TAB) \t
  4. ASCII Backspace \b
  5. Non-breaking space (nbsp;) Replace Non-breaking space with one whitespace using PHP: $result = str_replace("\xc2\xa0", ' ', $original_string);[27]

Sentence spacing

  1. Sentence spacing in digital media - Wikipedia e.g. &Nbsp; &Ensp; &Emsp;

How to display the Non-breaking space In PHP?

$input = '12345678' . hex2bin('c2a0');
echo $input . PHP_EOL;
## Result of above script: '12345678 ' (one whitespace at the end)

echo bin2hex($input) . PHP_EOL;
## Result of above script: 3132333435363738c2a0

echo bin2hex('12345678') . PHP_EOL;
## Result of above script: 3132333435363738 (You mat notice the difference of script result is C2A0)

How to display the Non-breaking space In MySQL?

SELECT CONCAT('12345678', UNHEX('C2A0'))
-- Result of above query: '12345678 ' (one whitespace at the end)

SELECT HEX(CONCAT('12345678', UNHEX('C2A0')))
-- Result of above query: 3132333435363738C2A0

SELECT HEX('12345678')
-- Result of above query: 3132333435363738 (You mat notice the difference of query result is C2A0)

SELECT LENGTH('12345678')
-- Result of above query: 8

SELECT LENGTH(CONCAT('12345678', UNHEX('C2A0')))
-- Result of above query: 10

Remove control character[edit]

Control character - Wikipedia Using PHP to clean control character:

$input = 'some string may contains control characters';
$replacement = '';
$result = preg_replace('/[\x00-\x1F]/', $replacement, $input);

Fix garbled message text[edit]

Fix garbled message text

Further reading[edit]

references[edit]

  1. MySQL COALESCE() function - w3resource
  2. How to check if field is null or empty mysql? - Stack Overflow
  3. MySQL :: MySQL 5.0 Reference Manual :: 3.3.4.6 Working with NULL Values
  4. 如何判斷 Excel 儲存格的欄位值是 NULL
  5. SQL TRIM 函數 - 1Keydata SQL 語法教學
  6. Excel COUNTIFS and COUNTIF with multiple criteria – examples of usage
  7. Check if number is an Integer
  8. Excel specifications and limits
  9. A2
  10. regex - Mysql REGEXP with . and numbers only - Stack Overflow
  11. GOOGLE 試算表: 數字轉成文字
  12. Check if a valid date?
  13. Maximum life span - Wikipedia
  14. sql - Calculate Age in MySQL (InnoDb) - Stack Overflow
  15. shell - How to print only the duplicate values from a text file? - Unix & Linux Stack Exchange
  16. Finding duplicate values in MySQL - Stack Overflow
  17. sql - MySQL SELECT DISTINCT multiple columns - Stack Overflow
  18. linux - How to remove duplicate lines in a large multi-GB textfile? - Unix & Linux Stack Exchange
  19. sorting - Is there a way to ignore header lines in a UNIX sort? - Stack Overflow
  20. 命令執行的判斷依據: ; , &&, ||
  21. Linux tail command help and examples
  22. RIGHT、RIGHTB 函數 - Excel - Office.com
  23. LEN、LENB 函數 - Excel - Office.com
  24. How to remove first, last or certain characters from text in Excel?
  25. REPLACE、REPLACEB 函數 - Excel - Office.com
  26. Re: 請益 mysql空百=? - 看板 PHP - 批踢踢實業坊
  27. php - How to replace decoded Non-breakable space (nbsp) - Stack Overflow