Named entity recognition tools: Difference between revisions

Jump to navigation Jump to search
m
no edit summary
Tags: Mobile edit Mobile web edit
mNo edit summary
Tags: Mobile edit Mobile web edit
Line 1: Line 1:
Named entity recognition (NER) 或稱[https://zh.wikipedia.org/wiki/%E5%91%BD%E5%90%8D%E5%AE%9E%E4%BD%93%E8%AF%86%E5%88%AB 命名實體識別]、實體識別、專有名詞辨識
Named entity recognition (NER) 或稱[https://zh.wikipedia.org/wiki/%E5%91%BD%E5%90%8D%E5%AE%9E%E4%BD%93%E8%AF%86%E5%88%AB 命名實體識別]、實體識別、專有名詞辨識
== CKIP Neural Chinese Word Segmentation, POS Tagging, and NER ==
[https://github.com/ckiplab/ckiptagger ckiplab/ckiptagger: CKIP Neural Chinese Word Segmentation, POS Tagging, and NER]
* license: [https://github.com/ckiplab/ckiptagger/blob/master/LICENSE GNU General Public License v3.0] {{Gd}}
* language support: Traditional Chinese
* programming language: Python
* Score:
* classes of entity<ref>[https://iptt.sinica.edu.tw/uploads/datas/2019/4/a251a61991139dc023d3559e93cd8d65.pdf 中文專有名詞辨識系統  簡報]</ref>
<table border="1" class="wikitable sortable">
<tr><th>Class name in English</th><th>Class name in Traditional Chinese</th></tr>
<tr><td>person</td><td>人名</td></tr>
<tr><td>norp</td><td>團體</td></tr>
<tr><td>FAC</td><td>設施</td></tr>
<tr><td>facility</td><td>設施*</td></tr>
<tr><td>ORG</td><td>組織</td></tr>
<tr><td>organization</td><td>組織*</td></tr>
<tr><td>gpe</td><td>地理</td></tr>
<tr><td>LOC</td><td>地點</td></tr>
<tr><td>location</td><td>地點*</td></tr>
<tr><td>product</td><td>商品</td></tr>
<tr><td>event</td><td>事件</td></tr>
<tr><td>WORK</td><td>藝術品</td></tr>
<tr><td>work of art</td><td>藝術品*</td></tr>
<tr><td>law</td><td>法律</td></tr>
<tr><td>language</td><td>語言</td></tr>
<tr><td>date</td><td>日期</td></tr>
<tr><td>time</td><td>時間</td></tr>
<tr><td>percent</td><td>比例</td></tr>
<tr><td>money</td><td>錢</td></tr>
<tr><td>quantity</td><td>數量</td></tr>
<tr><td>ordinal</td><td>序數</td></tr>
<tr><td>cardinal</td><td>數詞</td></tr>
</table>
: [[Image:Owl icon.jpg]] Notes: Asterisk symbol means there are different class name in English but same class name in Chinese.
== Stanford CoreNLP ==
[https://stanfordnlp.github.io/CoreNLP/index.html Stanford CoreNLP – Natural language software | Stanford CoreNLP]
* license: GNU General Public License v3 {{Gd}}
* language support: English, Chinese ..
* programming language: Java
* Score: Available
* classes of entity: "For English, by default, this annotator recognizes named (PERSON, LOCATION, ORGANIZATION, MISC), numerical (MONEY, NUMBER, ORDINAL, PERCENT), and temporal (DATE, TIME, DURATION, SET) entities (12 classes). <ref>[https://stanfordnlp.github.io/CoreNLP/ner.html#description Named Entity Recognition – NERClassifierCombiner | Stanford CoreNLP]</ref>"
== spaCy ==
[https://spacy.io/ spaCy · Industrial-strength Natural Language Processing in Python]
* license: [https://github.com/explosion/spaCy/blob/master/LICENSE MIT License] {{Gd}}
* language support:
* programming language: Python
* Score:
* classes of entity: "PERSON, NORP, FAC, ORG, GPE, LOC, PRODUCT, EVENT, WORK_OF_ART, LAW, LANGUAGE, DATE, TIME, PERCENT, MONEY, QUANTITY, ORDINAL and CARDINAL <ref>[https://spacy.io/api/annotation#named-entities Annotation Specifications · spaCy API Documentation]</ref>"
== Google Cloud Natural Language ==
[https://cloud.google.com/natural-language/ Cloud Natural Language  |  Cloud Natural Language API  |  Google Cloud]
* license:
* language support: [https://cloud.google.com/natural-language/docs/languages 語言支援  |  Cloud Natural Language API  |  Google Cloud] included Traditional Chinese
* programming language: multiple
* Score: Available. '''salience score''' in the [0, 1.0] range. "The salience score for an entity provides information about the importance or centrality of that entity to the entire document text. Scores closer to 0 are less salient, while scores closer to 1.0 are highly salient.<ref>[https://cloud.google.com/natural-language/docs/reference/rest/v1/Entity Entity  |  Cloud Natural Language API  |  Google Cloud]</ref>"
* classes of entity: Details on [https://cloud.google.com/natural-language/docs/reference/rest/v1/Entity Entity  |  Cloud Natural Language API  |  Google Cloud] -> Type of the entity e.g. "UNKNOWN, PERSON, LOCATION, ORGANIZATION, EVENT, WORK_OF_ART, CONSUMER_GOOD, OTHER, PHONE_NUMBER, ADDRESS, DATE, NUMBER and PRICE"


== Amazon Comprehend ==
== Amazon Comprehend ==
Line 121: Line 61:
   </tr>
   </tr>
</table>
</table>
== Apache OpenNLP ==
[https://opennlp.apache.org/ Apache OpenNLP]
* license: Apache License, Version 2.0
* language support: English, French, German, Italian and Dutch. Not support Chinese. [https://opennlp.apache.org/models.html Models Download - Apache OpenNLP]
* programming language: Java
* Score:
* classes of entity:
== Baidu 百度AI开放平台 ==
[https://ai.baidu.com/tech/nlp 语言处理基础技术-百度AI开放平台] "专名识别"<ref>[https://ai.baidu.com/docs#/NLP-Basic-API/63eec4cf 词法分析接口]</ref> / [https://github.com/baidu/lac baidu/lac: 百度NLP:分词,词性标注,命名实体识别]
* license:
* language support: simplified Chinese
* programming language: multiple
* Score:
* classes of entity:
<table border="1" class="wikitable sortable">
<tr><th>Class name in English (缩略词)</th><th>Class name in Simplified Chinese</th><th>Class name in Traditional Chinese</th></tr>
<tr><td>PER</td><td>人名</td><td>人名</td></tr>
<tr><td>LOC</td><td>地名</td><td>地名</td></tr>
<tr><td>ORG</td><td>机构名</td><td>機構名</td></tr>
<tr><td>TIME</td><td>时间</td><td>時間</td></tr>
</table>
== CKIP Neural Chinese Word Segmentation, POS Tagging, and NER ==
[https://github.com/ckiplab/ckiptagger ckiplab/ckiptagger: CKIP Neural Chinese Word Segmentation, POS Tagging, and NER]
* license: [https://github.com/ckiplab/ckiptagger/blob/master/LICENSE GNU General Public License v3.0] {{Gd}}
* language support: Traditional Chinese
* programming language: Python
* Score:
* classes of entity<ref>[https://iptt.sinica.edu.tw/uploads/datas/2019/4/a251a61991139dc023d3559e93cd8d65.pdf 中文專有名詞辨識系統  簡報]</ref>
<table border="1" class="wikitable sortable">
<tr><th>Class name in English</th><th>Class name in Traditional Chinese</th></tr>
<tr><td>person</td><td>人名</td></tr>
<tr><td>norp</td><td>團體</td></tr>
<tr><td>FAC</td><td>設施</td></tr>
<tr><td>facility</td><td>設施*</td></tr>
<tr><td>ORG</td><td>組織</td></tr>
<tr><td>organization</td><td>組織*</td></tr>
<tr><td>gpe</td><td>地理</td></tr>
<tr><td>LOC</td><td>地點</td></tr>
<tr><td>location</td><td>地點*</td></tr>
<tr><td>product</td><td>商品</td></tr>
<tr><td>event</td><td>事件</td></tr>
<tr><td>WORK</td><td>藝術品</td></tr>
<tr><td>work of art</td><td>藝術品*</td></tr>
<tr><td>law</td><td>法律</td></tr>
<tr><td>language</td><td>語言</td></tr>
<tr><td>date</td><td>日期</td></tr>
<tr><td>time</td><td>時間</td></tr>
<tr><td>percent</td><td>比例</td></tr>
<tr><td>money</td><td>錢</td></tr>
<tr><td>quantity</td><td>數量</td></tr>
<tr><td>ordinal</td><td>序數</td></tr>
<tr><td>cardinal</td><td>數詞</td></tr>
</table>
: [[Image:Owl icon.jpg]] Notes: Asterisk symbol means there are different class name in English but same class name in Chinese.
== Google Cloud Natural Language ==
[https://cloud.google.com/natural-language/ Cloud Natural Language  |  Cloud Natural Language API  |  Google Cloud]
* license:
* language support: [https://cloud.google.com/natural-language/docs/languages 語言支援  |  Cloud Natural Language API  |  Google Cloud] included Traditional Chinese
* programming language: multiple
* Score: Available. '''salience score''' in the [0, 1.0] range. "The salience score for an entity provides information about the importance or centrality of that entity to the entire document text. Scores closer to 0 are less salient, while scores closer to 1.0 are highly salient.<ref>[https://cloud.google.com/natural-language/docs/reference/rest/v1/Entity Entity  |  Cloud Natural Language API  |  Google Cloud]</ref>"
* classes of entity: Details on [https://cloud.google.com/natural-language/docs/reference/rest/v1/Entity Entity  |  Cloud Natural Language API  |  Google Cloud] -> Type of the entity e.g. "UNKNOWN, PERSON, LOCATION, ORGANIZATION, EVENT, WORK_OF_ART, CONSUMER_GOOD, OTHER, PHONE_NUMBER, ADDRESS, DATE, NUMBER and PRICE"


== IBM Watson ==
== IBM Watson ==
Line 129: Line 140:
* Score:
* Score:
* classes of entity: "Date, Duration, EmailAddress, Facility, GeographicFeature, Hashtag, IPAddress, JobTitle, Location and more ..."<ref>[https://cloud.ibm.com/docs/services/natural-language-understanding?topic=natural-language-understanding-entity-types-version-2&locale=en Entity types (Version 2)]</ref>
* classes of entity: "Date, Duration, EmailAddress, Facility, GeographicFeature, Hashtag, IPAddress, JobTitle, Location and more ..."<ref>[https://cloud.ibm.com/docs/services/natural-language-understanding?topic=natural-language-understanding-entity-types-version-2&locale=en Entity types (Version 2)]</ref>
== Microsoft Azure Cognitive Services ==
[https://docs.microsoft.com/zh-tw/azure/cognitive-services/text-analytics/how-tos/text-analytics-how-to-entity-linking 搭配文字分析 API 使用實體辨識 - Azure Cognitive Services | Microsoft Docs] / [https://docs.microsoft.com/en-us/azure/cognitive-services/text-analytics/how-tos/text-analytics-how-to-entity-linking?tabs=version-3 Use entity recognition with the Text Analytics API - Azure Cognitive Services | Microsoft Docs]
* license
* language support: English & Chinese. See details on [https://docs.microsoft.com/en-us/azure/cognitive-services/text-analytics/language-support?tabs=named-entity-recognition Language support - Text Analytics API - Azure Cognitive Services | Microsoft Docs].
* programming language: The language if supports sending a REST API request. See details on [https://docs.microsoft.com/en-us/azure/cognitive-services/text-analytics/how-tos/text-analytics-how-to-entity-linking?tabs=version-3#sending-a-rest-api-request Use entity recognition with the Text Analytics API - Azure Cognitive Services | Microsoft Docs]
* Score: Available.
* classes of entity: Person, PersonType, Location, Organization, Event, Product and more. See details on [https://docs.microsoft.com/en-us/azure/cognitive-services/text-analytics/named-entity-types?tabs=general Supported Categories for Named Entity Recognition - Azure Cognitive Services | Microsoft Docs].
== spaCy ==
[https://spacy.io/ spaCy · Industrial-strength Natural Language Processing in Python]
* license: [https://github.com/explosion/spaCy/blob/master/LICENSE MIT License] {{Gd}}
* language support:
* programming language: Python
* Score:
* classes of entity: "PERSON, NORP, FAC, ORG, GPE, LOC, PRODUCT, EVENT, WORK_OF_ART, LAW, LANGUAGE, DATE, TIME, PERCENT, MONEY, QUANTITY, ORDINAL and CARDINAL <ref>[https://spacy.io/api/annotation#named-entities Annotation Specifications · spaCy API Documentation]</ref>"
== Stanford CoreNLP ==
[https://stanfordnlp.github.io/CoreNLP/index.html Stanford CoreNLP – Natural language software | Stanford CoreNLP]
* license: GNU General Public License v3 {{Gd}}
* language support: English, Chinese ..
* programming language: Java
* Score: Available
* classes of entity: "For English, by default, this annotator recognizes named (PERSON, LOCATION, ORGANIZATION, MISC), numerical (MONEY, NUMBER, ORDINAL, PERCENT), and temporal (DATE, TIME, DURATION, SET) entities (12 classes). <ref>[https://stanfordnlp.github.io/CoreNLP/ner.html#description Named Entity Recognition – NERClassifierCombiner | Stanford CoreNLP]</ref>"


== 卓騰語言科技中文斷詞 ==
== 卓騰語言科技中文斷詞 ==
Line 138: Line 177:
* classes of entity: "person, location, time, measurement and more ... <ref>[https://api.droidtown.co/document/ 卓騰語言科技中文斷詞 API]</ref>"
* classes of entity: "person, location, time, measurement and more ... <ref>[https://api.droidtown.co/document/ 卓騰語言科技中文斷詞 API]</ref>"


== 百度AI开放平台 ==
[https://ai.baidu.com/tech/nlp 语言处理基础技术-百度AI开放平台] "专名识别"<ref>[https://ai.baidu.com/docs#/NLP-Basic-API/63eec4cf 词法分析接口]</ref> / [https://github.com/baidu/lac baidu/lac: 百度NLP:分词,词性标注,命名实体识别]
* license:
* language support: simplified Chinese
* programming language: multiple
* Score:
* classes of entity:
<table border="1" class="wikitable sortable">
<tr><th>Class name in English (缩略词)</th><th>Class name in Simplified Chinese</th><th>Class name in Traditional Chinese</th></tr>
<tr><td>PER</td><td>人名</td><td>人名</td></tr>
<tr><td>LOC</td><td>地名</td><td>地名</td></tr>
<tr><td>ORG</td><td>机构名</td><td>機構名</td></tr>
<tr><td>TIME</td><td>时间</td><td>時間</td></tr>
</table>


== BosonNLP (out of service) ==
== BosonNLP (out of service) ==
Line 172: Line 197:
</table>
</table>


== Microsoft Azure Cognitive Services ==
[https://docs.microsoft.com/zh-tw/azure/cognitive-services/text-analytics/how-tos/text-analytics-how-to-entity-linking 搭配文字分析 API 使用實體辨識 - Azure Cognitive Services | Microsoft Docs] / [https://docs.microsoft.com/en-us/azure/cognitive-services/text-analytics/how-tos/text-analytics-how-to-entity-linking?tabs=version-3 Use entity recognition with the Text Analytics API - Azure Cognitive Services | Microsoft Docs]


* license
* language support: English & Chinese. See details on [https://docs.microsoft.com/en-us/azure/cognitive-services/text-analytics/language-support?tabs=named-entity-recognition Language support - Text Analytics API - Azure Cognitive Services | Microsoft Docs].
* programming language: The language if supports sending a REST API request. See details on [https://docs.microsoft.com/en-us/azure/cognitive-services/text-analytics/how-tos/text-analytics-how-to-entity-linking?tabs=version-3#sending-a-rest-api-request Use entity recognition with the Text Analytics API - Azure Cognitive Services | Microsoft Docs]
* Score: Available.
* classes of entity: Person, PersonType, Location, Organization, Event, Product and more. See details on [https://docs.microsoft.com/en-us/azure/cognitive-services/text-analytics/named-entity-types?tabs=general Supported Categories for Named Entity Recognition - Azure Cognitive Services | Microsoft Docs].
== Apache OpenNLP ==
[https://opennlp.apache.org/ Apache OpenNLP]
* license: Apache License, Version 2.0
* language support: English, French, German, Italian and Dutch. Not support Chinese. [https://opennlp.apache.org/models.html Models Download - Apache OpenNLP]
* programming language: Java
* Score:
* classes of entity:


== Other similar NER tools ==
== Other similar NER tools ==
Anonymous user

Navigation menu