Data cleaning: Difference between revisions

From LemonWiki共筆
Jump to navigation Jump to search
Tags: Mobile edit Mobile web edit
(10 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Check list ==
* row count
* duplicate data


== Check if field value was not fulfilled  ==
== Check if field value was not fulfilled  ==
Line 258: Line 263:
* If the data was imported from Excel, you should notice the 15 digit precision issue.
* If the data was imported from Excel, you should notice the 15 digit precision issue.


=== Numeric ===
=== Check if the column value is numeric ===
List of the possible abnormal values:
Possibile values
* All numeric values are odd or even if the data were generated by user naturally.


PHP:  
<pre>
* [http://php.net/manual/en/function.is-numeric.php is_numeric]
test data:
3.141592654
1.36184E+14
123,456.789
20740199601
346183773390240
="5"
</pre>


MySQL:  
MySQL:  
* Find the records which the value of `my_column` is numeric values entirely {{code | code = SELECT * FROM `my_table` WHERE `my_column` REGEXP '^[0-9]+$'}}<ref>[http://stackoverflow.com/questions/14343767/mysql-regexp-with-and-numbers-only regex - Mysql REGEXP with . and numbers only - Stack Overflow]</ref>
 
* Find the records which the value of `my_column` is '''NOT''' numeric values entirely {{code | code = SELECT * FROM `my_table` WHERE `my_column` NOT REGEXP '^[0-9]+$'}}
* Check if a value is integer e.g. 1234567
** Find the records which the value of `my_column` is numeric values entirely {{code | code = SELECT * FROM `my_table` WHERE `my_column` REGEXP '^[0-9]+$'}}<ref>[http://stackoverflow.com/questions/14343767/mysql-regexp-with-and-numbers-only regex - Mysql REGEXP with . and numbers only - Stack Overflow]</ref><ref>[https://stackoverflow.com/questions/75704/how-do-i-check-to-see-if-a-value-is-an-integer-in-mysql How do I check to see if a value is an integer in MySQL? - Stack Overflow]</ref>
 
* Check if a value is integer which may contains comma and dot symbols e.g. 1,234.567 or 3.414
** {{code | code = SELECT * FROM `my_table` WHERE `my_column` REGEXP '^[0-9,\.]+$'}}<ref>[https://community.denodo.com/answers/question/details?questionId=9060g000000XelhAAC&title=How+to+identify+if+values+in+a+column+is+numeric+%28+Function+similar+to+Isnumeric+is+SQL%29 How to identify if values in a column is numeric ( Function similar to Isnumeric is SQL)]</ref>
 
* Check if a value is NOT integer
** Find the records which the value of `my_column` is '''NOT''' numeric values entirely {{code | code = SELECT * FROM `my_table` WHERE `my_column` NOT REGEXP '^[0-9]+$'}}
 


If the digit of number is known, the SQL syntax could be more specific
If the digit of number is known, the SQL syntax could be more specific
* The {{kbd | key=tax_id}} column is 8 digits only. Find the well-formatted {{kbd | key=tax_id}} records by using {{code | code = SELECT * FROM `tax_id` WHERE `tax_id` REGEXP '^[0-9]{8}$'}}
* The {{kbd | key=tax_id}} column is 8 digits only. Find the well-formatted {{kbd | key=tax_id}} records by using {{code | code = SELECT * FROM `tax_id` WHERE `tax_id` REGEXP '^[0-9]{8}$'}}
PHP:
* [http://php.net/manual/en/function.is-numeric.php is_numeric] function
* [https://www.php.net/manual/en/function.is-int.php is_int] function


Excel & [https://www.google.com/sheets/about/ Google Sheets]:  
Excel & [https://www.google.com/sheets/about/ Google Sheets]:  
Line 280: Line 305:
** Return 1 if the cell value is (1) Numbers (2) Numbers that are stored as text e.g. {{code | code = <nowiki>="5"</nowiki>}}
** Return 1 if the cell value is (1) Numbers (2) Numbers that are stored as text e.g. {{code | code = <nowiki>="5"</nowiki>}}
** Return 0 if the cell value is (1) Text (2) Numbers in scientific (exponential) notation e.g. {{code | code = <nowiki>1.23E+16</nowiki>}} (3) Decimal numbers e.g. {{code | code = <nowiki>3.141592654</nowiki>}} (4) Negative numbers
** Return 0 if the cell value is (1) Text (2) Numbers in scientific (exponential) notation e.g. {{code | code = <nowiki>1.23E+16</nowiki>}} (3) Decimal numbers e.g. {{code | code = <nowiki>3.141592654</nowiki>}} (4) Negative numbers
<pre>
test data:
3.141592654
1.36184E+14
20740199601
346183773390240
="5"
</pre>


=== Time data: Validate the data format ===
=== Time data: Validate the data format ===
Line 295: Line 311:
=== Time data: Data was generated in N years ===
=== Time data: Data was generated in N years ===
Define the abnormal values of the time data ([http://en.wikipedia.org/wiki/Time_series time series])
Define the abnormal values of the time data ([http://en.wikipedia.org/wiki/Time_series time series])
* Verfiy the data were generated in 10 years. Possible abnormal values: {{code | code = 0001-01 00:00:00}} occurred in MySQL {{code | code = datetime}} type.
* Verfiy the data were generated in N years. Possible abnormal values: {{code | code = 0001-01 00:00:00}} occurred in MySQL {{code | code = datetime}} type.
* Verfiy the data were not newer than today
* Verfiy the data were not newer than today
* Verfiy the year of data were not {{kbd | key=1900}} if the data were imported from Microsoft Excel file. Datevalue<ref>[https://support.microsoft.com/zh-tw/office/datevalue-%E5%87%BD%E6%95%B8-df8b07d4-7761-4a93-bc33-b7471bbff252 DATEVALUE 函數 - Office 支援]</ref> was started from the year {{kbd | key=1900}} e.g.  
* Verfiy the year of data were not {{kbd | key=1900}} if the data were imported from Microsoft Excel file. Datevalue<ref>[https://support.microsoft.com/zh-tw/office/datevalue-%E5%87%BD%E6%95%B8-df8b07d4-7761-4a93-bc33-b7471bbff252 DATEVALUE 函數 - Office 支援]</ref> was started from the year {{kbd | key=1900}} e.g.  
Line 301: Line 317:
** {{code | code = 1900/1/1}} (converted time formatted value from 1)
** {{code | code = 1900/1/1}} (converted time formatted value from 1)
* Verfiy the value of data were not {{kbd | key=0000-00-00 00:00:00}}
* Verfiy the value of data were not {{kbd | key=0000-00-00 00:00:00}}
 
* Verfiy the diversity of data values e.g. [https://en.wikipedia.org/wiki/Variance Variance]


Find the normal values:  
Find the normal values:  
Line 527: Line 543:
=== Fix garbled message text ===
=== Fix garbled message text ===
[[Fix garbled message text]]
[[Fix garbled message text]]
== Tools ==
* [https://github.com/IvanMathy/Boop IvanMathy/Boop: A scriptable scratchpad for developers. In slow yet steady progress.] ([https://apps.apple.com/us/app/boop/id1518425043?mt=12 ‎Boop on the Mac App Store]) " ... to paste some plain text and run some basic text operations on it. "


== Further reading ==
== Further reading ==
Line 544: Line 563:
[[Category:Data Science]]
[[Category:Data Science]]
[[Category:MySQL]]
[[Category:MySQL]]
[[Category:Text file processing]]
[[Category:String manipulation]]

Revision as of 17:28, 8 December 2021

Check list

  • row count
  • duplicate data

Check if field value was not fulfilled

By purpose

Purpose Method (MySQL query syntax) Value1:
Fulfilled value what I want
Value2:
Fulfilled value NOT I want
Value3:
0
Value4:
NULL value
Value5:
Empty or white-spaces characters
values were not fulfilled or empty
(not contains 0)
WHERE column_name IS NULL
OR LENGTH(TRIM( column_name )) = 0
V V
values were not fulfilled or empty
(contains 0)
V V V
values were fulfilled and non-empty
(not contains 0)
V V
values were fulfilled and non-empty
(contains 0)
WHERE LENGTH(TRIM( column_name )) > 0 V V V
values (1) were not fulfilled or empty values (2) NOT I want
(not contains 0)
WHERE column_name IS NULL
OR LENGTH(TRIM( column_name )) = 0
OR column_name LIKE 'values NOT I want'
V V V
values (1) were not fulfilled or empty values (2) NOT I want
(contains 0)
V V V V

by datatype

VARCHAR and allows NULL value

data type of column possible column values method1:
find not fulfilled or empty values
method2:
find fulfilled and non-empty values
method3:
find NULL values
method4:
find not NULL values
VARCHAR and allows NULL fulfilled value ex:123 V V
NULL type:null V V
'NULL' type:string
0 V V
EMPTY ex: '' or space(s) ' ' V V

symbol V: means the column value will be able to find by means of the method


  • method1:
    • SELECT * FROM `my_table` WHERE COALESCE(column_name, '') = ''[1]
    • SELECT * FROM `my_table` WHERE column_name IS NULL OR LENGTH(TRIM( column_name )) = 0
    • SELECT * FROM `my_table` WHERE column_name IS NULL OR column_name = ''[2]
  • method2:
    • SELECT * FROM `my_table` WHERE column_name > ''
    • SELECT * FROM `my_table` WHERE LENGTH(TRIM( column_name )) > 0
    • SELECT * FROM `my_table` WHERE LENGTH(TRIM( column_name )) != 0
  • method3: SELECT * FROM `my_table` WHERE column_name IS NULL
  • method4: SELECT * FROM `my_table` WHERE column_name IS NOT NULL

VARCHAR or numeric

data type of column possible column values method5:
find values within the range
method6:
find values out of the range, empty & NULL values
VARCHAR or numeric values within the range ex: min ≤ value ≤ max V
values out of range V
NULL V
EMPTY ex: '' or space ' ' V
  • method5: SELECT * FROM `my_table` WHERE column_name BETWEEN min AND max
  • method6: SELECT * FROM `my_table` WHERE ( (COALESCE(column_name, '') = '') OR (column_name NOT BETWEEN min AND max) )

datetime and allows NULL value

possible column values

  1. 2024-03-28
  2. 00-00-00 00:00:00
  3. NULL

is null

Fill 0 if the value is NA or NULL

  • MySQL SQL syntax: COALESCE(): SELECT COALESCE(column_name, 0) or SELECT COALESCE(column_name, 'other_filled_value')
    • (1) Using COALESCE() function to replace the NULL value with 0.
    • (2) The case: 0/0 = null should be handled.
  • MySQL SQL syntax: combined IF() & ISNULL():SELECT IF(ISNULL((column_name), 0, column_name) or SELECT IF(ISNULL((column_name), 'other_filled_value', column_name)
  • python: pandas.DataFrame.fillna — pandas 0.16.0 documentation "Fill NA/NaN values using the specified method"

Find whether a variable is NULL. online demo

  • PHP is_null to find type:null null NOT type:string 'null' Icon_exclaim.gif
  • Google spreadsheet / Excel:
    • ISERR(value) " value - The value to be verified as an error type other than #N/A." ex: #NULL!
    • If the cell value is exactly NULL not #NULL!, You may use COUNTIF(value, "NULL") or EXACT(value, "NULL")
  • MySQL SQL syntax: SELECT * FROM table WHERE column IS NULL;[3]
  • R: is.null(): R: The Null Object
  • Excel[4]

Find whether a variable is NOT NULL

  • MySQL SQL syntax: SELECT * FROM table WHERE column IS NOT NULL;

Find whether a variable is NOT #N/A

  • Excel: =NOT(ISERROR(cell_value))

javascript

check if field value was not fulfilled: NULL, empty value

Icon_exclaim.gif NOT include those data which its field value fulfilled with default value automatically (demo on sqlfiddle)

  1. Good.gif quick solution: find records with NULL value OR empty, space value
    • MySQL solution: SELECT * FROM table_name WHERE column_name IS NULL OR LENGTH(TRIM( column_name )) = 0;
  2. find records with NULL value: (note: not #NULL!)
    • MySQL solution: SELECT * FROM table_name WHERE column_name IS NULL;
    • EXCEL: =EXACT(A2, "NULL")
  3. find records with empty value: (not contains NULL value)
    • MySQL: SELECT * FROM table_name WHERE LENGTH(TRIM( column_name )) = 0; Icon_exclaim.gif SQL query SELECT * FROM table_name WHERE column_name IS NOT NULL includes empty value
    • MS SQL Server: SELECT * FROM table_name WHERE LEN( LTRIM(RTRIM(column_name)) ) = 0; [5]
  4. Excel starting date: 1900/1/0 (converted time formatted value from 0), 1900/1/1 (converted time formatted value from 1), 1900/1/2 ...
    • solution: step1: Replace the year > 100 from this year with empty value at EXCEL: =IF(ISERR(YEAR(A2)), "", IF(YEAR(A2)<1914, "", A2)) (this formula also handle empty value and non well-formatted column value ex: 0000-12-31 ) ; step2: change the format of cell to time format
    • trivial approach : EXCEL: =IF(ISERR(YEAR(A2)), "", IF(YEAR(A2)-YEAR(NOW())>100, "", A2)) Icon_exclaim.gif this formula could not handle empty value because it return 0. If I change the format of cell to time format, 0 will become 1900/1/0.
  5. Using PHP
    • empty() function to find 0, null, false, empty string, empty array values.
    • if(empty($var) && $var !== 0 && $var !== "0"){ .. } to find null, false, empty string, empty array values BUT not 0.
  6. check if field value was NULL & not equal to some value

check if field value was fulfilled

length of string > 0

  • MySQL: SELECT * FROM table_name WHERE LENGTH(TRIM( column_name )) != 0; demo[1]

column value is not null or 0

  • Excel: COUNTIFS(criteria_range1, "<>NULL", criteria_range1, "<>0")[6]

find if number or cell value is positive integer

  • EXCEL: =IFERROR(IF(AND(INT( value )= value, value>0), TRUE, FALSE), FALSE)[7] online demo

check numeric range

  • MySQL: SELECT * FROM table_name WHERE column_name BETWEEN min_number AND max_number; the value >= min_number AND value <= max_number ( min_number ≤ value ≤ max_number )

find NOT empty records means records without NULL or empty value:

  • MySQL: SELECT * FROM table_name WHERE LENGTH(TRIM( column_name )) != 0;
  • MySQL: SELECT * FROM table_name WHERE column_name != '' AND column_name IS NOT NULL;

Data Validation

Validate the format of field value. Related page: Regular expression

Email contains @ symbol

  • EXCEL: =IF(ISERR(FIND("@", A2, 1)), FALSE, TRUE) only check the field if contains @ symbol or not
    • result: (1) normal condition: return TRUE; (2) exceptional condition: return FALSE if @ symbol was not found
  • EXCEL: =FIND("@", A2, 2) only check the field if contains @ symbol or not
    • syntax: FIND(find_text, with_text, [start_num]) the start_num is 2 because the position of @ symbol should be larger than 1 (position of first char is 1)
    • result: (1) normal condition: return the number larger than 1; (2) exceptional condition: return #VALUE! if @ symbol was not found
  • PHP: PHP FILTER_VALIDATE_EMAIL Filter
    • "Returns the filtered data, or FALSE if the filter fails." quoted from PHP.net

Number precision in Excel

Number precision: 15 digits (Excel中最多的有效位數為15位)[8][9]

raw data: 1234567890123456 ->

  • (numeric format 數值格式) 1234567890123450.00 Icon_exclaim.gif losing precision
  • (general format 通用格式) 1.23457E+15 Icon_exclaim.gif losing precision
  • (text format 文字格式) 1234567890123456

large numbers

  • If the data was imported from Excel, you should notice the 15 digit precision issue.

Check if the column value is numeric

Possibile values

test data:
3.141592654
1.36184E+14
123,456.789
20740199601
346183773390240
="5"

MySQL:

  • Check if a value is integer e.g. 1234567
    • Find the records which the value of `my_column` is numeric values entirely SELECT * FROM `my_table` WHERE `my_column` REGEXP '^[0-9]+$'[10][11]
  • Check if a value is integer which may contains comma and dot symbols e.g. 1,234.567 or 3.414
    • SELECT * FROM `my_table` WHERE `my_column` REGEXP '^[0-9,\.]+$'[12]
  • Check if a value is NOT integer
    • Find the records which the value of `my_column` is NOT numeric values entirely SELECT * FROM `my_table` WHERE `my_column` NOT REGEXP '^[0-9]+$'


If the digit of number is known, the SQL syntax could be more specific

  • The tax_id column is 8 digits only. Find the well-formatted tax_id records by using SELECT * FROM `tax_id` WHERE `tax_id` REGEXP '^[0-9]{8}$'


PHP:


Excel & Google Sheets:

  • Using ISNUMBER Function: =INT(ISNUMBER(A1))
    • Return 1 if the cell value is (1) Numbers (2) Numbers in scientific (exponential) notation e.g. 1.36184E+14 (3) Decimal numbers e.g. 3.141592654 (4) Negative numbers
    • Return 0 if the cell value is (1) Text (2) Numbers that are stored as text e.g. ="5"
  • Google Sheets only: Using REGEXMATCH, TRIM & CONCAT[13] functions: =IF(REGEXMATCH(CONCAT("", TRIM(A1)), "^\d+$"), 1, 0)
    • Return 1 if the cell value is (1) Numbers (2) Numbers that are stored as text e.g. ="5"
    • Return 0 if the cell value is (1) Text (2) Numbers in scientific (exponential) notation e.g. 1.23E+16 (3) Decimal numbers e.g. 3.141592654 (4) Negative numbers

Time data: Validate the data format

Validate the datetime value

Time data: Data was generated in N years

Define the abnormal values of the time data (time series)

  • Verfiy the data were generated in N years. Possible abnormal values: 0001-01 00:00:00 occurred in MySQL datetime type.
  • Verfiy the data were not newer than today
  • Verfiy the year of data were not 1900 if the data were imported from Microsoft Excel file. Datevalue[14] was started from the year 1900 e.g.
    • 1900/1/0 (converted time formatted value from 0),
    • 1900/1/1 (converted time formatted value from 1)
  • Verfiy the value of data were not 0000-00-00 00:00:00
  • Verfiy the diversity of data values e.g. Variance

Find the normal values:

  • MySQL: Assume the data was generated in recent 10 years & not newer than today
    • SELECT * FROM `my_table` WHERE ( `my_time_column` >= CURDATE() - INTERVAL 10 YEAR ) AND ( `my_time_column` < CURDATE() + 1);
      • Icon_exclaim.gif NOT `my_time_column` < CURDATE()。 ex: CURDATE() is 2024-03-28. Which is the same with 2024-03-28 00:00:00
    • SELECT * FROM `my_table` WHERE ( YEAR( CURDATE() ) - YEAR( `my_time_column`) <= 10 ) AND ( `my_time_column` < CURDATE() + 1);
  • MySQL: Assume the data was generated in recent 10 years & not newer than current timestamp. More precision to second compared with the above approach.
    • SELECT * FROM `my_table` WHERE ( `my_time_column` >= CURDATE() - INTERVAL 10 YEAR ) AND ( `my_time_column` <= CURRENT_TIMESTAMP);
      • You need to check the SELECT CURRENT_TIMESTAMP); if correct or not before you delete the abnormal data (timezone issue)

Check if the date valid

Time data: Human birth year (age) data

Based on the existing record, the longest-living person who lived to 122[16].

MySQL query is as follows[17] where the column `birthday` is date type.

WHERE TIMESTAMPDIFF(YEAR, `birthday`, CURDATE()) <= 122

Using UNIX_TIMESTAMP() function to check the abnormality of birthday data is not appropriate. Because the birthdays which are earlier 1970-01-01 00:00:00 UTC will all become zero.


String contains special characters

Duplicate data

Find duplicate data

EXCEL

Finding duplicate rows that differ in one column
Finding duplicate rows that differ in multiple columns

Cygwin

  • uniq command on Cygwin of Win Os windows.png or Linux Os linux.png : uniq -d <file.txt> > <duplicated_items.txt>[18]

MySQL

Finding duplicate rows that differ in one column

Find the duplicated data for one column[19]

-- Generate test data.
CREATE TABLE `table_name` (
  `id` int(11) NOT NULL,
  `content` varchar(5) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

INSERT INTO `table_name` (`id`, `content`) VALUES
(1, 'apple'),
(2, 'lemon'),
(3, 'apple');

ALTER TABLE `table_name`
  ADD PRIMARY KEY (`id`);

-- Find duplicated data
SELECT `content`, COUNT(*) count 
FROM `table_name` 
GROUP BY `content` 
HAVING count > 1;

SELECT tmp.* FROM 
( 
  SELECT `content`, count(*) count FROM `table_name` GROUP BY `content` 
) tmp 
WHERE tmp.count >1;
Finding duplicate rows that differ in multiple columns

Using CONCAT for multiple columns ex: column_1, column_2

SELECT count(*) count, CONCAT(  `column_1`, `column_2`  ) 'key'
	FROM `table_name`
	GROUP BY CONCAT(  `column_1`, `column_2`  )
HAVING count > 1;

or

SELECT tmp.key FROM
(
	SELECT count(*) count, CONCAT(  `column_1`, `column_2`  ) 'key'
	FROM `table_name`
	GROUP BY CONCAT(  `column_1`, `column_2`  )
) tmp
WHERE tmp.count >=2
other cases

For counting purpose: find the count of repeated id (type: int) between table_a and table_b

SELECT count(DISTINCT(id)) FROM table_a WHERE id IN
(
   SELECT DISTINCT(id) FROM table_b
)

Google Spreadsheet

Deduplicate

  • GNU Coreutils: sort invocation OS: Linux Os linux.png , cygwin of Win Os windows.png . More details on Merge multiple plain text files.
    • To remove duplicate lines:
      • sort -us -o <output_unique.file> <input.file> in a large text file (GB)[21]
      • cat <input.file> | grep <pattern> | sort | uniq Processes text line by line and prints the unique lines which match a specified pattern. Equal to these steps: (1) cat <input.file> | grep <pattern> > <tmp.file> (2) sort <tmp.file> | uniq
    • Ignore first n line(s) & remove duplicate lines[22][23][24]
      • (1) ignore first one line: (head -n 1 <file> && tail -n +2 <file> | sort -us) > newfile
      • (2) ignore first two lines: (head -n 2 <file> && tail -n +3 <file> | sort -us) > newfile

Counting number of duplicate occurrence

MySQL: find the number of duplicate occurrence between list_a & list_b which using the same primary key: column name id

  • SELECT count(DISTINCT(`id`)) FROM `list_a` WHERE `id` IN (SELECT DISTINCT(`id`) FROM `list_b`) ;

Excel:

Other

  • symbol e.g. data-mining or data_mining

Counting

Outlier / Anomaly detection

Anomaly detection

unique number of data values

If the data values were generated by different users, the unique number of data values should be larger than ____

Data handling

Remove first, last or certain characters from text

  • Excel: using RIGHT[25] + LEN[26] functions [27]
  • Excel: if the length of text was fixed after removed, you may try to use REPLACE[28] + LEN functions (demo)

Remove leading and trailing spaces from text

UPDATE `table` 
SET `column` = TRIM( `column` ) 
WHERE LENGTH(TRIM( `column` )) != LENGTH( `column` );

Remove other string look like whitespace

Whitespace character

  1. IDEOGRAPHIC SPACE (全形空白、全型空白, U+3000)[29]:
    • diaplay:
      <?php $string = "111" . json_decode('"\u3000"') . "222"; echo $string;?>
    • replace with space:
      <?php echo str_replace(json_decode('"\u3000"'), " ", $string);?>
  2. ASCII Vertical Tab \v
  3. ASCII Horizontal Tab (TAB) \t
  4. ASCII Backspace \b
  5. Non-breaking space (nbsp;) Replace Non-breaking space with one whitespace using PHP: $result = str_replace("\xc2\xa0", ' ', $original_string);[30]

Sentence spacing

  1. Sentence spacing in digital media - Wikipedia e.g. &Nbsp; &Ensp; &Emsp;

How to display the Non-breaking space In PHP?

$input = '12345678' . hex2bin('c2a0');
echo $input . PHP_EOL;
## Result of above script: '12345678 ' (one whitespace at the end)

echo bin2hex($input) . PHP_EOL;
## Result of above script: 3132333435363738c2a0

echo bin2hex('12345678') . PHP_EOL;
## Result of above script: 3132333435363738 (You mat notice the difference of script result is C2A0)

How to display the Non-breaking space In MySQL?

SELECT CONCAT('12345678', UNHEX('C2A0'))
-- Result of above query: '12345678 ' (one whitespace at the end)

SELECT HEX(CONCAT('12345678', UNHEX('C2A0')))
-- Result of above query: 3132333435363738C2A0

SELECT HEX('12345678')
-- Result of above query: 3132333435363738 (You mat notice the difference of query result is C2A0)

SELECT LENGTH('12345678')
-- Result of above query: 8

SELECT LENGTH(CONCAT('12345678', UNHEX('C2A0')))
-- Result of above query: 10

Remove control character

Control character - Wikipedia Using PHP to clean control character:

$input = 'some string may contains control characters';
$replacement = '';
$result = preg_replace('/[\x00-\x1F]/', $replacement, $input);

Fix garbled message text

Fix garbled message text

Tools

Further reading

references

  1. MySQL COALESCE() function - w3resource
  2. How to check if field is null or empty mysql? - Stack Overflow
  3. MySQL :: MySQL 5.0 Reference Manual :: 3.3.4.6 Working with NULL Values
  4. 如何判斷 Excel 儲存格的欄位值是 NULL
  5. SQL TRIM 函數 - 1Keydata SQL 語法教學
  6. Excel COUNTIFS and COUNTIF with multiple criteria – examples of usage
  7. Check if number is an Integer
  8. Excel specifications and limits
  9. A2
  10. regex - Mysql REGEXP with . and numbers only - Stack Overflow
  11. How do I check to see if a value is an integer in MySQL? - Stack Overflow
  12. How to identify if values in a column is numeric ( Function similar to Isnumeric is SQL)
  13. GOOGLE 試算表: 數字轉成文字
  14. DATEVALUE 函數 - Office 支援
  15. Check if a valid date?
  16. Maximum life span - Wikipedia
  17. sql - Calculate Age in MySQL (InnoDb) - Stack Overflow
  18. shell - How to print only the duplicate values from a text file? - Unix & Linux Stack Exchange
  19. Finding duplicate values in MySQL - Stack Overflow
  20. sql - MySQL SELECT DISTINCT multiple columns - Stack Overflow
  21. linux - How to remove duplicate lines in a large multi-GB textfile? - Unix & Linux Stack Exchange
  22. sorting - Is there a way to ignore header lines in a UNIX sort? - Stack Overflow
  23. 命令執行的判斷依據: ; , &&, ||
  24. Linux tail command help and examples
  25. RIGHT、RIGHTB 函數 - Excel - Office.com
  26. LEN、LENB 函數 - Excel - Office.com
  27. How to remove first, last or certain characters from text in Excel?
  28. REPLACE、REPLACEB 函數 - Excel - Office.com
  29. Re: 請益 mysql空百=? - 看板 PHP - 批踢踢實業坊
  30. php - How to replace decoded Non-breakable space (nbsp) - Stack Overflow