Simple data anonymization: Difference between revisions

From LemonWiki共筆
Jump to navigation Jump to search
mNo edit summary
Line 3: Line 3:




== case1: 王小明 --> 王OO ==
== case1: 王小明 --> 王OO; 孤獨求敗 --> 孤OO ==
ex:  
ex:  
* 楊過 -> 楊OO
* 楊過 -> 楊OO
Line 25: Line 25:
</pre>
</pre>


== case2: 王小明 --> 王O明 ==
== case2: 王小明 --> 王O明; 孤獨求敗 --> 孤OO敗 ==
ex:  
ex:  
* 楊過 --> 楊O
* 楊過 --> 楊O

Revision as of 10:26, 29 July 2016

Simple data anonymization 使用 Excel 或 MySQL 資料庫查詢方式,做簡易個資去識別化


case1: 王小明 --> 王OO; 孤獨求敗 --> 孤OO

ex:

  • 楊過 -> 楊OO
  • 王小明 --> 王OO
  • 孤獨求敗 --> 孤OO
  • Guo da-xia --> GOO


methods

  • Excel:
    • =REPLACE(A2, 2, LEN(A2)-1, "OO") also applied for 3 or 4 words
    • =REPLACE(A2, 2, 2, "O") Icon_exclaim.gif only applied for 3 words, NOT for 4 words
  • MySQL:
-- SET @name := "楊過";
SET @name := "王小明";
-- SET @name := "孤獨求敗";
-- SET @name := "Guo da-xia";

SELECT CONCAT(LEFT(@name, 1), 'OO');

case2: 王小明 --> 王O明; 孤獨求敗 --> 孤OO敗

ex:

  • 楊過 --> 楊O
  • 王小明 --> 王O明
  • 孤獨求敗 --> 孤OO敗
  • Guo da-xia --> GOOOOOOOOa

methods:

  • Excel:
    • =IF(LEN(A1)=2, LEFT(A1, 1)&"O", LEFT(A1, 1)&REPT("O", LEN(A1)-2)&RIGHT(A1, 1))
    • =REPLACE(A1, 2, 1, "O")[1] Icon_exclaim.gif only applied for 3 words, NOT for 4 words
  • PHP: using regular_replace
if(mb_strlen($string, "UTF-8") == 2){
   echo mb_substr($string, 0, 1, "UTF-8") . "O";

}else{
  $pattern = '/^(\X)(\X+)(\X)/u';
  preg_match($pattern, $string, $matches);
  echo $matches[1]. str_repeat("O", mb_strlen($string, "UTF-8") - 2) . $matches[3];

}
  • MySQL:
SET @name := "楊過";
-- SET @name := "王小明";
-- SET @name := "孤獨求敗";
-- SET @name := "Guo da-xia";

SELECT CASE  
    WHEN CHAR_LENGTH(@name) =2 THEN CONCAT(LEFT(@name, 1), 'O')
    ELSE CONCAT(LEFT(@name, 1), REPEAT('O', CHAR_LENGTH(@name)-2), RIGHT(@name, 1))
    END;



reference

further reading