Simple data anonymization: Difference between revisions

From LemonWiki共筆
Jump to navigation Jump to search
m (Text replacement - "== references ==" to "== References ==")
 
(12 intermediate revisions by one other user not shown)
Line 1: Line 1:
使用 Excel 或 MySQL 資料庫查詢方式,簡易地將個人資料去識別化 (將姓名馬賽克) 。


Simple data anonymization 使用 Excel 或 MySQL 資料庫查詢方式,做簡易個資去識別化
== 姓名的個資去識別化 ==
=== case1: 王小明 --> 王O明; 孤獨求敗 --> 孤OO敗 ===
Reserve the first and last characters of name, and replace other characters with the symbol 'OO'.


ex:
* 楊過 --> 楊O
* 王小明 --> 王O明
* 孤獨求敗 --> 孤OO敗
* Guo da-xia --> GOOOOOOOOa
methods:
* Excel:
** {{kbd | key=<nowiki>=IF(LEN(A1)=2, LEFT(A1, 1)&"O", LEFT(A1, 1)&REPT("O", LEN(A1)-2)&RIGHT(A1, 1))</nowiki>}}
** {{kbd | key=<nowiki>=REPLACE(A1, 2, 1, "O")</nowiki>}}<ref>[http://blog.xuite.net/yh96301/blog/80724141-Excel+2010%E5%A7%93%E5%90%8D%E7%9A%84%E7%AC%AC%E4%BA%8C%E5%80%8B%E5%AD%97%E5%8F%96%E4%BB%A3%E7%82%BAO Excel 2010姓名的第二個字取代為O @ 軟體使用教學 :: 隨意窩 Xuite日誌]</ref>  {{exclaim}} only applied for 3 characters, NOT for 4 characters
* PHP: using regular_replace
<pre>
if(mb_strlen($string, "UTF-8") == 2){
  echo mb_substr($string, 0, 1, "UTF-8") . "O";
}else{
  $pattern = '/^(\X)(\X+)(\X)/u';
  preg_match($pattern, $string, $matches);
  echo $matches[1]. str_repeat("O", mb_strlen($string, "UTF-8") - 2) . $matches[3];
}
</pre>
* MySQL:
<pre>
SET @name := "楊過";
-- SET @name := "王小明";
-- SET @name := "孤獨求敗";
-- SET @name := "Guo da-xia";
SELECT CASE 
    WHEN CHAR_LENGTH(@name) =2 THEN CONCAT(LEFT(@name, 1), 'O')
    ELSE CONCAT(LEFT(@name, 1), REPEAT('O', CHAR_LENGTH(@name)-2), RIGHT(@name, 1))
    END;
</pre>
=== case2: 王小明 --> 王O明; 孤獨求敗 --> 孤O敗 ===
ex:
* 楊過 -> 楊O
* 王小明 --> 王O明
* 孤獨求敗 --> 孤O敗
* Guo da-xia --> GOa
methods
* Excel:
** {{kbd | key=<nowiki>=IF(LEN(A2)=2, LEFT(A2, 1)&"O", LEFT(A2, 1)&REPT("O", 1)&RIGHT(A2, 1))</nowiki>}} also applied for 3 or 4 characters
=== case3: 王小明 --> 王OO; 孤獨求敗 --> 孤OO ===
Reserve the first character of name, and replace other characters with the symbol 'OO'.


== case1: 王小明 --> 王OO ==
ex:  
ex:  
* 楊過 -> 楊OO
* 楊過 -> 楊OO
Line 13: Line 67:
methods
methods
* Excel:  
* Excel:  
** {{kbd | key=<nowiki>=REPLACE(A2, 2, LEN(A2)-1, "OO")</nowiki>}} also applied for 3 or 4 words
** {{kbd | key=<nowiki>=REPLACE(A2, 2, LEN(A2)-1, "OO")</nowiki>}} also applied for 3 or 4 characters
** {{kbd | key=<nowiki>=REPLACE(A2, 2, 2, "O")</nowiki>}} {{exclaim}} only applied for 3 words, NOT for 4 words
** {{kbd | key=<nowiki>=REPLACE(A2, 2, 2, "O")</nowiki>}} {{exclaim}} only applied for 3 characters, NOT for 4 characters
* MySQL:  
* MySQL:  
<pre>
<pre>
Line 25: Line 79:
</pre>
</pre>


== case2: 王小明 --> 王O明 ==
=== case4: 王小明 --> 王OO; 孤獨求敗 --> 孤OO敗 ===
ex: 王小明 --> 王O明
Replace the second and third characters with the symbol 'OO', and reserve other characters.
* Excel:
 
** {{kbd | key=<nowiki>=REPLACE(A2, 2, LEN(A2)-2, "O")</nowiki>}} also applied for 3 or 4 words ex: 王王小明 -> 王O明
* 楊過 --> 楊OO
** {{kbd | key=<nowiki>=REPLACE(A2, 2, 1, "O")</nowiki>}}<ref>[http://blog.xuite.net/yh96301/blog/80724141-Excel+2010%E5%A7%93%E5%90%8D%E7%9A%84%E7%AC%AC%E4%BA%8C%E5%80%8B%E5%AD%97%E5%8F%96%E4%BB%A3%E7%82%BAO Excel 2010姓名的第二個字取代為O @ 軟體使用教學 :: 隨意窩 Xuite日誌]</ref>  {{exclaim}} only applied for 3 words, NOT for 4 words
* 王小明 --> 王OO
* PHP: using regular_replace
* 孤獨求敗 --> 孤OO敗
* Guo da-xia --> GOO da-xia
 
methods:
* Excel
** {{kbd | key=<nowiki>=REPLACE(A1, 2, 2, "OO")</nowiki>}}
 
* PHP
<pre>
<pre>
//$string = '楊過';
$string = '王小明';
$string = '王小明';
$pattern = '/^(\X)(\X)(\X+)/u';
//$string = '孤獨求敗';
$replacement = '$1O$3';
//$string = 'Guo da-xia';
 
$pattern = '/^(\X)(\X\X?)(\X?)/u';
$replacement = '$1OO$3';
echo preg_replace($pattern, $replacement, $string);
echo preg_replace($pattern, $replacement, $string);
</pre>
</pre>
* MySQL: using SUBSTRING
 
* MySQL
<pre>
<pre>
SELECT CONCAT(SUBSTRING_INDEX('王小明', SUBSTRING('王小明', 2, 1), 1),  
-- SET @name := "楊過";
SUBSTRING('王小明', 2, 1),
SET @name := "王小明";
SUBSTRING_INDEX('王小明', SUBSTRING('王小明', 2, 1), -1));
-- SET @name := "孤獨求敗";
-- SET @name := "Guo da-xia";
 
SELECT CONCAT(LEFT(@name, 1), 'OO', RIGHT(@name, CHAR_LENGTH(@name)-3));
</pre>
</pre>






== 手機號碼的個資去識別化 ==
=== case1: 0912345678 --> 09123XXXXX8 ===
methods:
* Excel
** {{kbd | key=<nowiki>=LEFT(A2, 5)&REPT("X", 5)&RIGHT(A2, 1)</nowiki>}}


reference
== References ==
<references />
<references />


further reading
further reading
* [https://vtaiwan.tw/personal-data-protection/init/ 個資利用與去識別化 - vTaiwan.tw]
* [https://vtaiwan.tw/personal-data-protection/init/ 個資利用與去識別化 - vTaiwan.tw]
* [http://www.cc.ntu.edu.tw/chinese/epaper/0040/20170320_4008.html 淺談個資遮罩]
related terms
* [https://en.wikipedia.org/wiki/Data_masking Data masking] ([http://www.ithome.com.tw/tech/87443 資料遮罩])


[[Category:Data transformation]] [[Category:Data Science]] [[Category:Excel]]
[[Category:Data transformation]] [[Category:Data Science]] [[Category:Excel]]

Latest revision as of 17:13, 10 July 2022

使用 Excel 或 MySQL 資料庫查詢方式,簡易地將個人資料去識別化 (將姓名馬賽克) 。

姓名的個資去識別化[edit]

case1: 王小明 --> 王O明; 孤獨求敗 --> 孤OO敗[edit]

Reserve the first and last characters of name, and replace other characters with the symbol 'OO'.

ex:

  • 楊過 --> 楊O
  • 王小明 --> 王O明
  • 孤獨求敗 --> 孤OO敗
  • Guo da-xia --> GOOOOOOOOa

methods:

  • Excel:
    • =IF(LEN(A1)=2, LEFT(A1, 1)&"O", LEFT(A1, 1)&REPT("O", LEN(A1)-2)&RIGHT(A1, 1))
    • =REPLACE(A1, 2, 1, "O")[1] Icon_exclaim.gif only applied for 3 characters, NOT for 4 characters
  • PHP: using regular_replace
if(mb_strlen($string, "UTF-8") == 2){
   echo mb_substr($string, 0, 1, "UTF-8") . "O";

}else{
  $pattern = '/^(\X)(\X+)(\X)/u';
  preg_match($pattern, $string, $matches);
  echo $matches[1]. str_repeat("O", mb_strlen($string, "UTF-8") - 2) . $matches[3];

}
  • MySQL:
SET @name := "楊過";
-- SET @name := "王小明";
-- SET @name := "孤獨求敗";
-- SET @name := "Guo da-xia";

SELECT CASE  
    WHEN CHAR_LENGTH(@name) =2 THEN CONCAT(LEFT(@name, 1), 'O')
    ELSE CONCAT(LEFT(@name, 1), REPEAT('O', CHAR_LENGTH(@name)-2), RIGHT(@name, 1))
    END;

case2: 王小明 --> 王O明; 孤獨求敗 --> 孤O敗[edit]

ex:

  • 楊過 -> 楊O
  • 王小明 --> 王O明
  • 孤獨求敗 --> 孤O敗
  • Guo da-xia --> GOa


methods

  • Excel:
    • =IF(LEN(A2)=2, LEFT(A2, 1)&"O", LEFT(A2, 1)&REPT("O", 1)&RIGHT(A2, 1)) also applied for 3 or 4 characters


case3: 王小明 --> 王OO; 孤獨求敗 --> 孤OO[edit]

Reserve the first character of name, and replace other characters with the symbol 'OO'.

ex:

  • 楊過 -> 楊OO
  • 王小明 --> 王OO
  • 孤獨求敗 --> 孤OO
  • Guo da-xia --> GOO


methods

  • Excel:
    • =REPLACE(A2, 2, LEN(A2)-1, "OO") also applied for 3 or 4 characters
    • =REPLACE(A2, 2, 2, "O") Icon_exclaim.gif only applied for 3 characters, NOT for 4 characters
  • MySQL:
-- SET @name := "楊過";
SET @name := "王小明";
-- SET @name := "孤獨求敗";
-- SET @name := "Guo da-xia";

SELECT CONCAT(LEFT(@name, 1), 'OO');

case4: 王小明 --> 王OO; 孤獨求敗 --> 孤OO敗[edit]

Replace the second and third characters with the symbol 'OO', and reserve other characters.

  • 楊過 --> 楊OO
  • 王小明 --> 王OO
  • 孤獨求敗 --> 孤OO敗
  • Guo da-xia --> GOO da-xia

methods:

  • Excel
    • =REPLACE(A1, 2, 2, "OO")
  • PHP
//$string = '楊過';
$string = '王小明';
//$string = '孤獨求敗';
//$string = 'Guo da-xia';

$pattern = '/^(\X)(\X\X?)(\X?)/u';
$replacement = '$1OO$3';
echo preg_replace($pattern, $replacement, $string);
  • MySQL
-- SET @name := "楊過";
SET @name := "王小明";
-- SET @name := "孤獨求敗";
-- SET @name := "Guo da-xia";

SELECT CONCAT(LEFT(@name, 1), 'OO', RIGHT(@name, CHAR_LENGTH(@name)-3));


手機號碼的個資去識別化[edit]

case1: 0912345678 --> 09123XXXXX8[edit]

methods:

  • Excel
    • =LEFT(A2, 5)&REPT("X", 5)&RIGHT(A2, 1)

References[edit]

further reading

related terms