Stop Word: Difference between revisions

Jump to navigation Jump to search
1,186 bytes added ,  2 October 2023
no edit summary
mNo edit summary
No edit summary
 
(12 intermediate revisions by 2 users not shown)
Line 10: Line 10:
* search the gmail inclue "这 OR 个 OR 来 OR 为 OR 国 OR 们 OR 时 OR 说" for deleting spam letters in Chinese(Simplified)
* search the gmail inclue "这 OR 个 OR 来 OR 为 OR 国 OR 们 OR 时 OR 说" for deleting spam letters in Chinese(Simplified)


download file
* [https://github.com/stopwords-iso/stopwords-zh/tree/master stopwords-iso/stopwords-zh: Chinese stopwords collection]


reference
references
* [http://www.google.com.tw/support/bin/answer.py?answer=981&topic=352&hl=zh_TW Google說明中心: 為何 Google 不讓我搜尋數字或像是 how 和 the 的字詞?][http://scholar.google.com/intl/zh-TW/help/basics.html#stopwords]([http://www.google.com.tw/support/bin/answer.py?answer=981&topic=352&hl=en EN])
* [http://www.google.com.tw/intl/zh-TW/insidesearch/tipstricks/all.html#characters 搜尋提示及秘訣 – 搜尋主頁 – Google] "在搜尋中包含或略過特定字詞及字元 如果某些常見字詞及字元 (例如「the」和「&」) 對您的搜尋至關重要,請在其前後加上英文引號,例如電影或書名中的「the」可標示為「"the"」。 ... ... " {{access | date = 2015-02-15}}
* [http://arts.cuhk.edu.hk/Lexis/chifreq/ Chinese Character Frequency Statistics for Hong Kong, Mainland China and Taiwan - A Trans-Regional, Diachronic Survey]: 香港、大陸、台灣 - 跨地區、跨年代漢語常用字頻統計
* [http://humanum.arts.cuhk.edu.hk/Lexis/chifreq/ Chinese Character Frequency Statistics for Hong Kong, Mainland China and Taiwan - A Trans-Regional, Diachronic Survey]: 香港、大陸、台灣 - 跨地區、跨年代漢語常用字頻統計 {{access | date = 2015-11-24}}
* [http://www.ranks.nl/stopwords Stopwords] "Collection of stopword lists in many languages." {{access | date = 2015-11-24}}
* [https://en.wikipedia.org/wiki/Stop_words Stop words - Wikipedia] {{access | date = 2016-11-14}}
* Adobe (n.d.). [https://helpx.adobe.com/experience-manager/kb/Stopwordlist.html Optimize search by adding stop words] {{access | date = 2017-06-22}} 提供德國 (de)、英文 (en)、西班牙 (es)、法國 (fr)、荷蘭 (nl)、瑞典 (se) 語言的停用字。
* [http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html sklearn.feature_extraction.text.CountVectorizer — scikit-learn 0.19.1 documentation] {{access | date = 2018-04-14}}
* 中文停用字: [https://github.com/zake7749/word2vec-tutorial/blob/master/jieba_dict/stopwords.txt word2vec-tutorial/stopwords.txt at master · zake7749/word2vec-tutorial · GitHub]


[[Category:NLP]]
[[Category:Search]]
[[Category:Search]]
Anonymous user

Navigation menu