Stop Word: Difference between revisions

From LemonWiki共筆
Jump to navigation Jump to search
mNo edit summary
No edit summary
 
(2 intermediate revisions by one other user not shown)
Line 10: Line 10:
* search the gmail inclue "这 OR 个 OR 来 OR 为 OR 国 OR 们 OR 时 OR 说" for deleting spam letters in Chinese(Simplified)
* search the gmail inclue "这 OR 个 OR 来 OR 为 OR 国 OR 们 OR 时 OR 说" for deleting spam letters in Chinese(Simplified)


download file
* [https://github.com/stopwords-iso/stopwords-zh/tree/master stopwords-iso/stopwords-zh: Chinese stopwords collection]


references
references
Line 18: Line 20:
* Adobe (n.d.). [https://helpx.adobe.com/experience-manager/kb/Stopwordlist.html Optimize search by adding stop words] {{access | date = 2017-06-22}} 提供德國 (de)、英文 (en)、西班牙 (es)、法國 (fr)、荷蘭 (nl)、瑞典 (se) 語言的停用字。
* Adobe (n.d.). [https://helpx.adobe.com/experience-manager/kb/Stopwordlist.html Optimize search by adding stop words] {{access | date = 2017-06-22}} 提供德國 (de)、英文 (en)、西班牙 (es)、法國 (fr)、荷蘭 (nl)、瑞典 (se) 語言的停用字。
* [http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html sklearn.feature_extraction.text.CountVectorizer — scikit-learn 0.19.1 documentation] {{access | date = 2018-04-14}}
* [http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html sklearn.feature_extraction.text.CountVectorizer — scikit-learn 0.19.1 documentation] {{access | date = 2018-04-14}}
* 中文停用字: [https://github.com/zake7749/word2vec-tutorial/blob/master/jieba_dict/stopwords.txt word2vec-tutorial/stopwords.txt at master · zake7749/word2vec-tutorial · GitHub]


[[Category:NLP]]
[[Category:Search]]
[[Category:Search]]

Latest revision as of 14:32, 2 October 2023

Stop Word (單一高頻字、停用字、停止字串)

English: a, of, the, in, is, she, he, to be, as, because, if, when
Chinese(Traditional): 的 一 是 不 人 在 有 我 了 中 ... 這 個 來 為 國 們 著 時 會 說
Chinese(Simplified):  的 一 是 不 人 在 有 我 了 中 ... 这 个 来 为 国 们 着 时 会 说


use case

  • search the gmail inclue "这 OR 个 OR 来 OR 为 OR 国 OR 们 OR 时 OR 说" for deleting spam letters in Chinese(Simplified)

download file

references