Split text by symbol: Difference between revisions

From LemonWiki共筆
Jump to navigation Jump to search
No edit summary
 
(26 intermediate revisions by the same user not shown)
Line 1: Line 1:
Split text by symbol (delimiter / separator)
Split text by symbol (another string, also called delimiter or separator)


== Approachs ==
== Example data ==
* Excel: (1) LEFT, RIGHT, FIND, LEN (2) MOD, FIND, LEN
Task: Split the position e.g. 25.040215, 121.512532 by the comma ( , ) to obtain the latitude (1st string) and longitude (2nd string).
* PHP: explode
* MySQL: [http://www.w3resource.com/mysql/string-functions/mysql-substring_index-function.php MySQL SUBSTRING_INDEX() function - w3resource]
 
== Example ==
Task: Split 25.040215, 121.512532 by the comma ( , )


Expected output:
Expected output:
Line 13: Line 8:
* 121.512532
* 121.512532


MySQL Approach
== Approaches ==
# How to get the first string ( 25.040215 ): {{code | code=SELECT TRIM(SUBSTRING_INDEX('25.040215, 121.512532' ,',', <span style="color:red">1</span>))}}
list of equivalent approaches to split text by symbol
# How to get the second string ( 121.512532 ): {{code | code=SELECT TRIM(SUBSTRING_INDEX('25.040215, 121.512532' ,',', <span style="color:red">-1</span>))}}
* MySQL: [http://www.w3resource.com/mysql/string-functions/mysql-substring_index-function.php MySQL SUBSTRING_INDEX() function - w3resource]
* Excel: (1) [https://support.microsoft.com/en-us/kb/214261 parse data] ([https://support.microsoft.com/zh-tw/kb/214261 資料剖析]) {{exclaim}} only allowed one character used as separator! (2) LEFT, RIGHT, FIND, LEN (3) MOD, FIND, LEN
* PHP: [http://php.net/explode explode] or [http://php.net/manual/en/function.preg-split.php preg_split] function
 


=== Split text by symbol in MySQL ===
{{exclaim}} Only one symbol was allowed. The following query will get the last element if there are more than two symbols.
# to get the first string ( 25.040215 ): {{code | code=SELECT TRIM(SUBSTRING_INDEX('25.040215, 121.512532' ,',', <span style="color:red">1</span>))}}
# to get the second string ( 121.512532 ): {{code | code=SELECT TRIM(SUBSTRING_INDEX('25.040215, 121.512532' ,',', <span style="color:red">-1</span>))}}
[http://sqlfiddle.com/#!9/d90dc/2/0 Live demo on SQL Fiddle]
<pre>
CREATE TABLE tbl(str varchar(50));
INSERT INTO tbl VALUES('25.040215, 121.512532');
INSERT INTO tbl VALUES('25.040215, 121.512532, 999.999'); /*bad data: 2 or more symbols*/
INSERT INTO tbl VALUES('123.456'); /*bad data: no symbol exists*/
SELECT TRIM(SUBSTRING_INDEX(str ,',', 1)), TRIM(SUBSTRING_INDEX(str ,',', -1)) FROM tbl WHERE str LIKE '%,%';
</pre>
=== Split text by symbol in Excel ===
Comparison of approaches
<table border="1" class="wikitable sortable">
  <tr>
    <th>Approach</th>
    <th>Output</th>   
  </tr>
  <tr>
    <td>Approach 1: Parse data</td>
    <td>'''Multiple''' parts will be returned if the content contains two or more separator characters.</td>   
  </tr>
  <tr>
    <td>Approach 2: SUBSTRING_INDEX</td>
    <td>Only '''two''' parts will be returned if the content contains two or more separator characters.</td>   
  </tr>
</table>
==== Approach 1: Parse data ====
[https://support.microsoft.com/en-us/kb/214261 parse data] ([https://support.microsoft.com/zh-tw/kb/214261 資料剖析])
* instruction with screenshots: [https://support.office.com/en-za/Article/split-text-into-different-cells-30b14928-5550-41f5-97ca-7a3e9c363ed7 Split text into different cells - Excel]
==== Approach 2: SUBSTRING_INDEX ====
<table border="1" style="">
<tr style="background-color: #555555; color: #ffffff;">
  <th></th>
  <th>A</th>
  <th>B</th>
  <th>C</th>
</tr>
<tr>
  <td>1</td>
  <td>25.040215, 121.512532</td>
  <td><div style="background-color: yellow">25.040215</div></td>
  <td><div style="background-color: yellow">121.512532</div></td>
</tr>
</table>
If the data set is perfect
# separator character: ,
# to get the first string ( 25.040215 ): B1 = {{code | code=<span style="background-color: #F6CEF5">MID( A1, 1, FIND(",", A1)-1)</span> }}
# to get the second string ( 121.512532 ): C1 = {{code | code=<span style="background-color: #ccc">MID( A1, FIND(",", A1)+1, LEN(A1))</span> }}
if the data set was mixed with empty values.  Return empty if the value is empty.
# to get the first string ( 25.040215 ): B1 = {{code | code=IF(ISERROR(<span style="background-color: #F6CEF5">MID(A1, 1, FIND(",", A1)-1)</span>), "", <span style="background-color: #F6CEF5">MID(A1, 1, FIND(",", A1)-1)</span>) }}
# to get the second string ( 121.512532 ): C1 = {{code | code=IF(ISERROR(TRIM(<span style="background-color: #ccc">MID(A1, FIND(",", A1)+1, LEN(A1))</span>)), "", TRIM(MID(<span style="background-color: #ccc">A1, FIND(",", A1)+1, LEN(A1))</span>) }}
[https://docs.google.com/spreadsheets/d/11VeZIxGR7orL4smf_0G6oi7VxxNi3FFsCL9b6QrLhBE/edit?usp=sharingc  online demo] (allow to edit)
=== Split text by symbol in PHP ===
PHP script
<pre>
$input_string = "25.040215, 121.512532";
$list1 = explode(",", $input_string);
$trimmed_list1 = array_map('trim', $list1);
$list2 = preg_split("/[\s,]+/", $input_string);
print_r($trimmed_list1);
print_r($list2);
</pre>
output
<pre>
Array (
  [0] => 25.040215
  [1] => 121.512532
)
</pre>
[[Category:Data transformation]]
[[Category:Data Science]]
[[Category:MySQL]]
[[Category:Excel]]
[[Category:Programming]]
[[Category:Programming]]
[[Category:Data Science]]

Latest revision as of 14:29, 28 September 2023

Split text by symbol (another string, also called delimiter or separator)

Example data[edit]

Task: Split the position e.g. 25.040215, 121.512532 by the comma ( , ) to obtain the latitude (1st string) and longitude (2nd string).

Expected output:

  • 25.040215
  • 121.512532

Approaches[edit]

list of equivalent approaches to split text by symbol


Split text by symbol in MySQL[edit]

Icon_exclaim.gif Only one symbol was allowed. The following query will get the last element if there are more than two symbols.

  1. to get the first string ( 25.040215 ): SELECT TRIM(SUBSTRING_INDEX('25.040215, 121.512532' ,',', 1))
  2. to get the second string ( 121.512532 ): SELECT TRIM(SUBSTRING_INDEX('25.040215, 121.512532' ,',', -1))

Live demo on SQL Fiddle

CREATE TABLE tbl(str varchar(50));
INSERT INTO tbl VALUES('25.040215, 121.512532');
INSERT INTO tbl VALUES('25.040215, 121.512532, 999.999'); /*bad data: 2 or more symbols*/
INSERT INTO tbl VALUES('123.456'); /*bad data: no symbol exists*/
SELECT TRIM(SUBSTRING_INDEX(str ,',', 1)), TRIM(SUBSTRING_INDEX(str ,',', -1)) FROM tbl WHERE str LIKE '%,%';

Split text by symbol in Excel[edit]

Comparison of approaches

Approach Output
Approach 1: Parse data Multiple parts will be returned if the content contains two or more separator characters.
Approach 2: SUBSTRING_INDEX Only two parts will be returned if the content contains two or more separator characters.

Approach 1: Parse data[edit]

parse data (資料剖析)

Approach 2: SUBSTRING_INDEX[edit]

A B C
1 25.040215, 121.512532
25.040215
121.512532

If the data set is perfect

  1. separator character: ,
  2. to get the first string ( 25.040215 ): B1 = MID( A1, 1, FIND(",", A1)-1)
  3. to get the second string ( 121.512532 ): C1 = MID( A1, FIND(",", A1)+1, LEN(A1))

if the data set was mixed with empty values. Return empty if the value is empty.

  1. to get the first string ( 25.040215 ): B1 = IF(ISERROR(MID(A1, 1, FIND(",", A1)-1)), "", MID(A1, 1, FIND(",", A1)-1))
  2. to get the second string ( 121.512532 ): C1 = IF(ISERROR(TRIM(MID(A1, FIND(",", A1)+1, LEN(A1)))), "", TRIM(MID(A1, FIND(",", A1)+1, LEN(A1)))


online demo (allow to edit)

Split text by symbol in PHP[edit]

PHP script

$input_string = "25.040215, 121.512532";

$list1 = explode(",", $input_string);
$trimmed_list1 = array_map('trim', $list1);


$list2 = preg_split("/[\s,]+/", $input_string);


print_r($trimmed_list1);
print_r($list2);

output

Array ( 
  [0] => 25.040215 
  [1] => 121.512532 
)