Data cleaning: Difference between revisions

Jump to navigation Jump to search
314 bytes added ,  18 March 2019
Line 339: Line 339:


==== MySQL ====
==== MySQL ====
 
===== Finding duplicate rows that differ in one column =====
Find the duplicated data for one column<ref>[http://stackoverflow.com/questions/688549/finding-duplicate-values-in-mysql?rq=1 Finding duplicate values in MySQL - Stack Overflow]</ref>
Find the duplicated data for one column<ref>[http://stackoverflow.com/questions/688549/finding-duplicate-values-in-mysql?rq=1 Finding duplicate values in MySQL - Stack Overflow]</ref>
<pre>
<pre>
Line 369: Line 369:
</pre>
</pre>


===== Finding duplicate rows that differ in multiple columns =====
Using {{kbd | key =CONCAT}} for multiple columns ex: column_1, column_2  
Using {{kbd | key =CONCAT}} for multiple columns ex: column_1, column_2  
<pre>
SELECT count(*) count, CONCAT(  `column_1`, `column_2`  ) 'key'
FROM `table_name`
GROUP BY CONCAT(  `column_1`, `column_2`  )
HAVING count > 1;
</pre>
or
<pre>
<pre>
SELECT tmp.key FROM
SELECT tmp.key FROM
Line 380: Line 389:
</pre>
</pre>


===== other cases =====
For counting purpose: find the count of repeated id (type: int) between table_a and table_b
For counting purpose: find the count of repeated id (type: int) between table_a and table_b
<pre>
<pre>

Navigation menu