Data cleaning: Difference between revisions

Jump to navigation Jump to search
688 bytes removed ,  19 November 2019
m
Line 445: Line 445:
== Counting ==
== Counting ==


=== Counting number of occurrences (or frequency) of string ===
[[Count occurrences of a word in string]]
Cygwin
* (1) separate each string by [[Return symbol | return_symbol]] (2) [https://www.computerhope.com/unix/uuniq.htm uniq command] on Cygwin of {{Win}} or {{Linux}}: {{kbd | key=<nowiki>sort <file.txt> | uniq -c</nowiki>}}<ref>[https://unix.stackexchange.com/questions/134446/counting-the-occurrences-of-the-string text processing - Counting the occurrences of the string - Unix & Linux Stack Exchange]</ref>
 
file: test.txt
<pre>
#apple
#追劇
#電影
#綜藝
#Apple
#藍芽
</pre>
 
Result of the execution of command: {{kbd | key=<nowiki>sort test.txt | uniq -ic | sort -nr</nowiki>}}
<pre>
2 #Apple
  1 #電影
  1 #追劇
  1 #藍芽
  1 #綜藝
</pre>


== Outlier / Anomaly detection ==
== Outlier / Anomaly detection ==

Navigation menu