Split text by symbol: Difference between revisions
Jump to navigation
Jump to search
(Created page with "Split text by symbol (delimiter / separator) Approachs * Excel: (1) LEFT, RIGHT, FIND, LEN (2) MOD, FIND, LEN * PHP: explode * MySQL: [http://www.w3resource.com/mysql/string-...") |
|||
| (28 intermediate revisions by the same user not shown) | |||
| Line 1: | Line 1: | ||
Split text by symbol (delimiter | Split text by symbol (another string, also called delimiter or separator) | ||
== Example data == | |||
Task: Split the position e.g. 25.040215, 121.512532 by the comma ( , ) to obtain the latitude (1st string) and longitude (2nd string). | |||
* | |||
Expected output: | |||
* 25.040215 | |||
* 121.512532 | |||
== Approaches == | |||
list of equivalent approaches to split text by symbol | |||
* MySQL: [http://www.w3resource.com/mysql/string-functions/mysql-substring_index-function.php MySQL SUBSTRING_INDEX() function - w3resource] | * MySQL: [http://www.w3resource.com/mysql/string-functions/mysql-substring_index-function.php MySQL SUBSTRING_INDEX() function - w3resource] | ||
* Excel: (1) [https://support.microsoft.com/en-us/kb/214261 parse data] ([https://support.microsoft.com/zh-tw/kb/214261 資料剖析]) {{exclaim}} only allowed one character used as separator! (2) LEFT, RIGHT, FIND, LEN (3) MOD, FIND, LEN | |||
* PHP: [http://php.net/explode explode] or [http://php.net/manual/en/function.preg-split.php preg_split] function | |||
=== Split text by symbol in MySQL === | |||
{{exclaim}} Only one symbol was allowed. The following query will get the last element if there are more than two symbols. | |||
# to get the first string ( 25.040215 ): {{code | code=SELECT TRIM(SUBSTRING_INDEX('25.040215, 121.512532' ,',', <span style="color:red">1</span>))}} | |||
# to get the second string ( 121.512532 ): {{code | code=SELECT TRIM(SUBSTRING_INDEX('25.040215, 121.512532' ,',', <span style="color:red">-1</span>))}} | |||
[http://sqlfiddle.com/#!9/d90dc/2/0 Live demo on SQL Fiddle] | |||
<pre> | |||
CREATE TABLE tbl(str varchar(50)); | |||
INSERT INTO tbl VALUES('25.040215, 121.512532'); | |||
INSERT INTO tbl VALUES('25.040215, 121.512532, 999.999'); /*bad data: 2 or more symbols*/ | |||
INSERT INTO tbl VALUES('123.456'); /*bad data: no symbol exists*/ | |||
SELECT TRIM(SUBSTRING_INDEX(str ,',', 1)), TRIM(SUBSTRING_INDEX(str ,',', -1)) FROM tbl WHERE str LIKE '%,%'; | |||
</pre> | |||
=== Split text by symbol in Excel === | |||
Comparison of approaches | |||
<table border="1" class="wikitable sortable"> | |||
<tr> | |||
<th>Approach</th> | |||
<th>Output</th> | |||
</tr> | |||
<tr> | |||
<td>Approach 1: Parse data</td> | |||
<td>'''Multiple''' parts will be returned if the content contains two or more separator characters.</td> | |||
</tr> | |||
<tr> | |||
<td>Approach 2: SUBSTRING_INDEX</td> | |||
<td>Only '''two''' parts will be returned if the content contains two or more separator characters.</td> | |||
</tr> | |||
</table> | |||
==== Approach 1: Parse data ==== | |||
[https://support.microsoft.com/en-us/kb/214261 parse data] ([https://support.microsoft.com/zh-tw/kb/214261 資料剖析]) | |||
* instruction with screenshots: [https://support.office.com/en-za/Article/split-text-into-different-cells-30b14928-5550-41f5-97ca-7a3e9c363ed7 Split text into different cells - Excel] | |||
==== Approach 2: SUBSTRING_INDEX ==== | |||
<table border="1" style=""> | |||
<tr style="background-color: #555555; color: #ffffff;"> | |||
<th></th> | |||
<th>A</th> | |||
<th>B</th> | |||
<th>C</th> | |||
</tr> | |||
<tr> | |||
<td>1</td> | |||
<td>25.040215, 121.512532</td> | |||
<td><div style="background-color: yellow">25.040215</div></td> | |||
<td><div style="background-color: yellow">121.512532</div></td> | |||
</tr> | |||
</table> | |||
If the data set is perfect | |||
# separator character: , | |||
# to get the first string ( 25.040215 ): B1 = {{code | code=<span style="background-color: #F6CEF5">MID( A1, 1, FIND(",", A1)-1)</span> }} | |||
# to get the second string ( 121.512532 ): C1 = {{code | code=<span style="background-color: #ccc">MID( A1, FIND(",", A1)+1, LEN(A1))</span> }} | |||
if the data set was mixed with empty values. Return empty if the value is empty. | |||
# to get the first string ( 25.040215 ): B1 = {{code | code=IF(ISERROR(<span style="background-color: #F6CEF5">MID(A1, 1, FIND(",", A1)-1)</span>), "", <span style="background-color: #F6CEF5">MID(A1, 1, FIND(",", A1)-1)</span>) }} | |||
# to get the second string ( 121.512532 ): C1 = {{code | code=IF(ISERROR(TRIM(<span style="background-color: #ccc">MID(A1, FIND(",", A1)+1, LEN(A1))</span>)), "", TRIM(MID(<span style="background-color: #ccc">A1, FIND(",", A1)+1, LEN(A1))</span>) }} | |||
[https://docs.google.com/spreadsheets/d/11VeZIxGR7orL4smf_0G6oi7VxxNi3FFsCL9b6QrLhBE/edit?usp=sharingc online demo] (allow to edit) | |||
=== Split text by symbol in PHP === | |||
PHP script | |||
<pre> | |||
$input_string = "25.040215, 121.512532"; | |||
$list1 = explode(",", $input_string); | |||
$trimmed_list1 = array_map('trim', $list1); | |||
$list2 = preg_split("/[\s,]+/", $input_string); | |||
print_r($trimmed_list1); | |||
print_r($list2); | |||
</pre> | |||
output | |||
<pre> | |||
Array ( | |||
[0] => 25.040215 | |||
[1] => 121.512532 | |||
) | |||
</pre> | |||
[[Category:Data transformation]] | |||
[[Category:Data Science]] | |||
[[Category:MySQL]] | |||
[[Category:Excel]] | |||
[[Category:Programming]] | [[Category:Programming]] | ||
Latest revision as of 14:29, 28 September 2023
Split text by symbol (another string, also called delimiter or separator)
Example data[edit]
Task: Split the position e.g. 25.040215, 121.512532 by the comma ( , ) to obtain the latitude (1st string) and longitude (2nd string).
Expected output:
- 25.040215
- 121.512532
Approaches[edit]
list of equivalent approaches to split text by symbol
- MySQL: MySQL SUBSTRING_INDEX() function - w3resource
- Excel: (1) parse data (資料剖析)
only allowed one character used as separator! (2) LEFT, RIGHT, FIND, LEN (3) MOD, FIND, LEN - PHP: explode or preg_split function
Split text by symbol in MySQL[edit]
Only one symbol was allowed. The following query will get the last element if there are more than two symbols.
- to get the first string ( 25.040215 ):
SELECT TRIM(SUBSTRING_INDEX('25.040215, 121.512532' ,',', 1)) - to get the second string ( 121.512532 ):
SELECT TRIM(SUBSTRING_INDEX('25.040215, 121.512532' ,',', -1))
CREATE TABLE tbl(str varchar(50));
INSERT INTO tbl VALUES('25.040215, 121.512532');
INSERT INTO tbl VALUES('25.040215, 121.512532, 999.999'); /*bad data: 2 or more symbols*/
INSERT INTO tbl VALUES('123.456'); /*bad data: no symbol exists*/
SELECT TRIM(SUBSTRING_INDEX(str ,',', 1)), TRIM(SUBSTRING_INDEX(str ,',', -1)) FROM tbl WHERE str LIKE '%,%';
Split text by symbol in Excel[edit]
Comparison of approaches
| Approach | Output |
|---|---|
| Approach 1: Parse data | Multiple parts will be returned if the content contains two or more separator characters. |
| Approach 2: SUBSTRING_INDEX | Only two parts will be returned if the content contains two or more separator characters. |
Approach 1: Parse data[edit]
- instruction with screenshots: Split text into different cells - Excel
Approach 2: SUBSTRING_INDEX[edit]
| A | B | C | |
|---|---|---|---|
| 1 | 25.040215, 121.512532 | 25.040215 |
121.512532 |
If the data set is perfect
- separator character: ,
- to get the first string ( 25.040215 ): B1 =
MID( A1, 1, FIND(",", A1)-1) - to get the second string ( 121.512532 ): C1 =
MID( A1, FIND(",", A1)+1, LEN(A1))
if the data set was mixed with empty values. Return empty if the value is empty.
- to get the first string ( 25.040215 ): B1 =
IF(ISERROR(MID(A1, 1, FIND(",", A1)-1)), "", MID(A1, 1, FIND(",", A1)-1)) - to get the second string ( 121.512532 ): C1 =
IF(ISERROR(TRIM(MID(A1, FIND(",", A1)+1, LEN(A1)))), "", TRIM(MID(A1, FIND(",", A1)+1, LEN(A1)))
online demo (allow to edit)
Split text by symbol in PHP[edit]
PHP script
$input_string = "25.040215, 121.512532";
$list1 = explode(",", $input_string);
$trimmed_list1 = array_map('trim', $list1);
$list2 = preg_split("/[\s,]+/", $input_string);
print_r($trimmed_list1);
print_r($list2);
output
Array ( [0] => 25.040215 [1] => 121.512532 )