Data cleaning: Difference between revisions

Jump to navigation Jump to search
No change in size ,  8 August 2017
m
Line 307: Line 307:
* PHP: [http://stackoverflow.com/questions/19271381/correctly-determine-if-date-string-is-a-valid-date-in-that-format php - Correctly determine if date string is a valid date in that format - Stack Overflow]
* PHP: [http://stackoverflow.com/questions/19271381/correctly-determine-if-date-string-is-a-valid-date-in-that-format php - Correctly determine if date string is a valid date in that format - Stack Overflow]


== duplicate data ==
== Duplicate data ==
=== find duplicate data ===
=== Find duplicate data ===
* EXCEL:  
* EXCEL:  
** one column data: [http://www.extendoffice.com/documents/excel/1499-count-duplicate-values-in-column.html How to count duplicate values in a column in Excel?] Using {{kbd | key = COUNTIF(range, criteria)}} {{access | date = 2015-08-25}} or using '''Pivot Tables'''(樞紐分析表)  to find the occurrence of value >= 2
** one column data: [http://www.extendoffice.com/documents/excel/1499-count-duplicate-values-in-column.html How to count duplicate values in a column in Excel?] Using {{kbd | key = COUNTIF(range, criteria)}} {{access | date = 2015-08-25}} or using '''Pivot Tables'''(樞紐分析表)  to find the occurrence of value >= 2
Line 363: Line 363:
** Menu: Data -> Remove duplicates
** Menu: Data -> Remove duplicates


=== deduplicate ===
=== Deduplicate ===
* EXCEL: Data Tools -> Remove Duplicates: [https://support.office.com/en-us/article/Filter-for-unique-values-or-remove-duplicate-values-d6549cf0-357a-4acf-9df5-ca507915b704 Filter for unique values or remove duplicate values] {{access | date = 2015-10-20}}
* EXCEL: Data Tools -> Remove Duplicates: [https://support.office.com/en-us/article/Filter-for-unique-values-or-remove-duplicate-values-d6549cf0-357a-4acf-9df5-ca507915b704 Filter for unique values or remove duplicate values] {{access | date = 2015-10-20}}


Line 379: Line 379:
* Google spreadsheet add-on: [https://www.ablebits.com/google-sheets-add-ons/remove-duplicates/howto.php Remove Duplicates for Google Sheets help]
* Google spreadsheet add-on: [https://www.ablebits.com/google-sheets-add-ons/remove-duplicates/howto.php Remove Duplicates for Google Sheets help]


=== counting number of duplicate occurrence ===
=== Counting number of duplicate occurrence ===
* MySQL: find the number of duplicate occurrence between list_a & list_b which using the same primary key: column name {{kbd | key = id}}
* MySQL: find the number of duplicate occurrence between list_a & list_b which using the same primary key: column name {{kbd | key = id}}
** {{kbd | key = SELECT count(DISTINCT(`id`)) FROM `list_a` WHERE `id` IN (SELECT DISTINCT(`id`) FROM `list_b`) ; }}
** {{kbd | key = SELECT count(DISTINCT(`id`)) FROM `list_a` WHERE `id` IN (SELECT DISTINCT(`id`) FROM `list_b`) ; }}
Line 385: Line 385:




=== other ===
=== Other ===
* symbol e.g. data-mining or data_mining
* symbol e.g. data-mining or data_mining


Navigation menu