Data cleaning: Difference between revisions

Data cleaning (edit)

254 bytes added , 19 November 2019

15,039

edits

@@ Line 448: / Line 448: @@
 Cygwin
 * (1) separate each string by [[Return symbol | return_symbol]] (2) [https://www.computerhope.com/unix/uuniq.htm uniq command] on Cygwin of {{Win}} or {{Linux}}: {{kbd | key=<nowiki>sort <file.txt> | uniq -c</nowiki>}}<ref>[https://unix.stackexchange.com/questions/134446/counting-the-occurrences-of-the-string text processing - Counting the occurrences of the string - Unix & Linux Stack Exchange]</ref>
+file: test.txt
+<pre>
+#apple
+#追劇
+#電影
+#綜藝
+#Apple
+#藍芽
+</pre>
+Result of the execution of command: {{kbd | key=<nowiki>sort test.txt | uniq -ic | sort -nr</nowiki>}}
+<pre>
+#Apple
+#電影
+#追劇
+#藍芽
+#綜藝
+</pre>
 == Outlier / Anomaly detection ==