Named entity recognition tools: Difference between revisions
Jump to navigation
Jump to search
m
no edit summary
Tags: Mobile edit Mobile web edit |
mNo edit summary Tags: Mobile edit Mobile web edit |
||
| Line 1: | Line 1: | ||
Named entity recognition (NER) 或稱[https://zh.wikipedia.org/wiki/%E5%91%BD%E5%90%8D%E5%AE%9E%E4%BD%93%E8%AF%86%E5%88%AB 命名實體識別]、實體識別、專有名詞辨識 | Named entity recognition (NER) 或稱[https://zh.wikipedia.org/wiki/%E5%91%BD%E5%90%8D%E5%AE%9E%E4%BD%93%E8%AF%86%E5%88%AB 命名實體識別]、實體識別、專有名詞辨識 | ||
== Amazon Comprehend == | == Amazon Comprehend == | ||
| Line 121: | Line 61: | ||
</tr> | </tr> | ||
</table> | </table> | ||
== Apache OpenNLP == | |||
[https://opennlp.apache.org/ Apache OpenNLP] | |||
* license: Apache License, Version 2.0 | |||
* language support: English, French, German, Italian and Dutch. Not support Chinese. [https://opennlp.apache.org/models.html Models Download - Apache OpenNLP] | |||
* programming language: Java | |||
* Score: | |||
* classes of entity: | |||
== Baidu 百度AI开放平台 == | |||
[https://ai.baidu.com/tech/nlp 语言处理基础技术-百度AI开放平台] "专名识别"<ref>[https://ai.baidu.com/docs#/NLP-Basic-API/63eec4cf 词法分析接口]</ref> / [https://github.com/baidu/lac baidu/lac: 百度NLP:分词,词性标注,命名实体识别] | |||
* license: | |||
* language support: simplified Chinese | |||
* programming language: multiple | |||
* Score: | |||
* classes of entity: | |||
<table border="1" class="wikitable sortable"> | |||
<tr><th>Class name in English (缩略词)</th><th>Class name in Simplified Chinese</th><th>Class name in Traditional Chinese</th></tr> | |||
<tr><td>PER</td><td>人名</td><td>人名</td></tr> | |||
<tr><td>LOC</td><td>地名</td><td>地名</td></tr> | |||
<tr><td>ORG</td><td>机构名</td><td>機構名</td></tr> | |||
<tr><td>TIME</td><td>时间</td><td>時間</td></tr> | |||
</table> | |||
== CKIP Neural Chinese Word Segmentation, POS Tagging, and NER == | |||
[https://github.com/ckiplab/ckiptagger ckiplab/ckiptagger: CKIP Neural Chinese Word Segmentation, POS Tagging, and NER] | |||
* license: [https://github.com/ckiplab/ckiptagger/blob/master/LICENSE GNU General Public License v3.0] {{Gd}} | |||
* language support: Traditional Chinese | |||
* programming language: Python | |||
* Score: | |||
* classes of entity<ref>[https://iptt.sinica.edu.tw/uploads/datas/2019/4/a251a61991139dc023d3559e93cd8d65.pdf 中文專有名詞辨識系統 簡報]</ref> | |||
<table border="1" class="wikitable sortable"> | |||
<tr><th>Class name in English</th><th>Class name in Traditional Chinese</th></tr> | |||
<tr><td>person</td><td>人名</td></tr> | |||
<tr><td>norp</td><td>團體</td></tr> | |||
<tr><td>FAC</td><td>設施</td></tr> | |||
<tr><td>facility</td><td>設施*</td></tr> | |||
<tr><td>ORG</td><td>組織</td></tr> | |||
<tr><td>organization</td><td>組織*</td></tr> | |||
<tr><td>gpe</td><td>地理</td></tr> | |||
<tr><td>LOC</td><td>地點</td></tr> | |||
<tr><td>location</td><td>地點*</td></tr> | |||
<tr><td>product</td><td>商品</td></tr> | |||
<tr><td>event</td><td>事件</td></tr> | |||
<tr><td>WORK</td><td>藝術品</td></tr> | |||
<tr><td>work of art</td><td>藝術品*</td></tr> | |||
<tr><td>law</td><td>法律</td></tr> | |||
<tr><td>language</td><td>語言</td></tr> | |||
<tr><td>date</td><td>日期</td></tr> | |||
<tr><td>time</td><td>時間</td></tr> | |||
<tr><td>percent</td><td>比例</td></tr> | |||
<tr><td>money</td><td>錢</td></tr> | |||
<tr><td>quantity</td><td>數量</td></tr> | |||
<tr><td>ordinal</td><td>序數</td></tr> | |||
<tr><td>cardinal</td><td>數詞</td></tr> | |||
</table> | |||
: [[Image:Owl icon.jpg]] Notes: Asterisk symbol means there are different class name in English but same class name in Chinese. | |||
== Google Cloud Natural Language == | |||
[https://cloud.google.com/natural-language/ Cloud Natural Language | Cloud Natural Language API | Google Cloud] | |||
* license: | |||
* language support: [https://cloud.google.com/natural-language/docs/languages 語言支援 | Cloud Natural Language API | Google Cloud] included Traditional Chinese | |||
* programming language: multiple | |||
* Score: Available. '''salience score''' in the [0, 1.0] range. "The salience score for an entity provides information about the importance or centrality of that entity to the entire document text. Scores closer to 0 are less salient, while scores closer to 1.0 are highly salient.<ref>[https://cloud.google.com/natural-language/docs/reference/rest/v1/Entity Entity | Cloud Natural Language API | Google Cloud]</ref>" | |||
* classes of entity: Details on [https://cloud.google.com/natural-language/docs/reference/rest/v1/Entity Entity | Cloud Natural Language API | Google Cloud] -> Type of the entity e.g. "UNKNOWN, PERSON, LOCATION, ORGANIZATION, EVENT, WORK_OF_ART, CONSUMER_GOOD, OTHER, PHONE_NUMBER, ADDRESS, DATE, NUMBER and PRICE" | |||
== IBM Watson == | == IBM Watson == | ||
| Line 129: | Line 140: | ||
* Score: | * Score: | ||
* classes of entity: "Date, Duration, EmailAddress, Facility, GeographicFeature, Hashtag, IPAddress, JobTitle, Location and more ..."<ref>[https://cloud.ibm.com/docs/services/natural-language-understanding?topic=natural-language-understanding-entity-types-version-2&locale=en Entity types (Version 2)]</ref> | * classes of entity: "Date, Duration, EmailAddress, Facility, GeographicFeature, Hashtag, IPAddress, JobTitle, Location and more ..."<ref>[https://cloud.ibm.com/docs/services/natural-language-understanding?topic=natural-language-understanding-entity-types-version-2&locale=en Entity types (Version 2)]</ref> | ||
== Microsoft Azure Cognitive Services == | |||
[https://docs.microsoft.com/zh-tw/azure/cognitive-services/text-analytics/how-tos/text-analytics-how-to-entity-linking 搭配文字分析 API 使用實體辨識 - Azure Cognitive Services | Microsoft Docs] / [https://docs.microsoft.com/en-us/azure/cognitive-services/text-analytics/how-tos/text-analytics-how-to-entity-linking?tabs=version-3 Use entity recognition with the Text Analytics API - Azure Cognitive Services | Microsoft Docs] | |||
* license | |||
* language support: English & Chinese. See details on [https://docs.microsoft.com/en-us/azure/cognitive-services/text-analytics/language-support?tabs=named-entity-recognition Language support - Text Analytics API - Azure Cognitive Services | Microsoft Docs]. | |||
* programming language: The language if supports sending a REST API request. See details on [https://docs.microsoft.com/en-us/azure/cognitive-services/text-analytics/how-tos/text-analytics-how-to-entity-linking?tabs=version-3#sending-a-rest-api-request Use entity recognition with the Text Analytics API - Azure Cognitive Services | Microsoft Docs] | |||
* Score: Available. | |||
* classes of entity: Person, PersonType, Location, Organization, Event, Product and more. See details on [https://docs.microsoft.com/en-us/azure/cognitive-services/text-analytics/named-entity-types?tabs=general Supported Categories for Named Entity Recognition - Azure Cognitive Services | Microsoft Docs]. | |||
== spaCy == | |||
[https://spacy.io/ spaCy · Industrial-strength Natural Language Processing in Python] | |||
* license: [https://github.com/explosion/spaCy/blob/master/LICENSE MIT License] {{Gd}} | |||
* language support: | |||
* programming language: Python | |||
* Score: | |||
* classes of entity: "PERSON, NORP, FAC, ORG, GPE, LOC, PRODUCT, EVENT, WORK_OF_ART, LAW, LANGUAGE, DATE, TIME, PERCENT, MONEY, QUANTITY, ORDINAL and CARDINAL <ref>[https://spacy.io/api/annotation#named-entities Annotation Specifications · spaCy API Documentation]</ref>" | |||
== Stanford CoreNLP == | |||
[https://stanfordnlp.github.io/CoreNLP/index.html Stanford CoreNLP – Natural language software | Stanford CoreNLP] | |||
* license: GNU General Public License v3 {{Gd}} | |||
* language support: English, Chinese .. | |||
* programming language: Java | |||
* Score: Available | |||
* classes of entity: "For English, by default, this annotator recognizes named (PERSON, LOCATION, ORGANIZATION, MISC), numerical (MONEY, NUMBER, ORDINAL, PERCENT), and temporal (DATE, TIME, DURATION, SET) entities (12 classes). <ref>[https://stanfordnlp.github.io/CoreNLP/ner.html#description Named Entity Recognition – NERClassifierCombiner | Stanford CoreNLP]</ref>" | |||
== 卓騰語言科技中文斷詞 == | == 卓騰語言科技中文斷詞 == | ||
| Line 138: | Line 177: | ||
* classes of entity: "person, location, time, measurement and more ... <ref>[https://api.droidtown.co/document/ 卓騰語言科技中文斷詞 API]</ref>" | * classes of entity: "person, location, time, measurement and more ... <ref>[https://api.droidtown.co/document/ 卓騰語言科技中文斷詞 API]</ref>" | ||
== BosonNLP (out of service) == | == BosonNLP (out of service) == | ||
| Line 172: | Line 197: | ||
</table> | </table> | ||
== Other similar NER tools == | == Other similar NER tools == | ||