Named entity recognition tools: Difference between revisions

From LemonWiki共筆
Jump to navigation Jump to search
No edit summary
Line 6: Line 6:
* language support: Traditional Chinese
* language support: Traditional Chinese
* programming language: Python
* programming language: Python
* Score:
* classes of entity<ref>[https://iptt.sinica.edu.tw/uploads/datas/2019/4/a251a61991139dc023d3559e93cd8d65.pdf 中文專有名詞辨識系統  簡報]</ref>
* classes of entity<ref>[https://iptt.sinica.edu.tw/uploads/datas/2019/4/a251a61991139dc023d3559e93cd8d65.pdf 中文專有名詞辨識系統  簡報]</ref>


Line 41: Line 42:
* language support: English, Chinese ..
* language support: English, Chinese ..
* programming language: Java
* programming language: Java
* Score: Available
* classes of entity: "For English, by default, this annotator recognizes named (PERSON, LOCATION, ORGANIZATION, MISC), numerical (MONEY, NUMBER, ORDINAL, PERCENT), and temporal (DATE, TIME, DURATION, SET) entities (12 classes). <ref>[https://stanfordnlp.github.io/CoreNLP/ner.html#description Named Entity Recognition – NERClassifierCombiner | Stanford CoreNLP]</ref>"
* classes of entity: "For English, by default, this annotator recognizes named (PERSON, LOCATION, ORGANIZATION, MISC), numerical (MONEY, NUMBER, ORDINAL, PERCENT), and temporal (DATE, TIME, DURATION, SET) entities (12 classes). <ref>[https://stanfordnlp.github.io/CoreNLP/ner.html#description Named Entity Recognition – NERClassifierCombiner | Stanford CoreNLP]</ref>"


Line 48: Line 50:
* language support:
* language support:
* programming language: Python
* programming language: Python
* Score:
* classes of entity: "PERSON, NORP, FAC, ORG, GPE, LOC, PRODUCT, EVENT, WORK_OF_ART, LAW, LANGUAGE, DATE, TIME, PERCENT, MONEY, QUANTITY, ORDINAL and CARDINAL <ref>[https://spacy.io/api/annotation#named-entities Annotation Specifications · spaCy API Documentation]</ref>"
* classes of entity: "PERSON, NORP, FAC, ORG, GPE, LOC, PRODUCT, EVENT, WORK_OF_ART, LAW, LANGUAGE, DATE, TIME, PERCENT, MONEY, QUANTITY, ORDINAL and CARDINAL <ref>[https://spacy.io/api/annotation#named-entities Annotation Specifications · spaCy API Documentation]</ref>"


Line 63: Line 66:
* language support:
* language support:
* programming language:
* programming language:
* Score: Available
* classes of entity: "COMMERCIAL_ITEM, DATE, EVENT, LOCATION, ORGANIZATION, OTHER, PERSON, QUANTITY and TITLE"<ref>[https://docs.aws.amazon.com/comprehend/latest/dg/how-entities.html Detect Entities - Amazon Comprehend]</ref> as the following:
* classes of entity: "COMMERCIAL_ITEM, DATE, EVENT, LOCATION, ORGANIZATION, OTHER, PERSON, QUANTITY and TITLE"<ref>[https://docs.aws.amazon.com/comprehend/latest/dg/how-entities.html Detect Entities - Amazon Comprehend]</ref> as the following:


Line 123: Line 127:
* language support:
* language support:
* programming language:  
* programming language:  
* Score:
* classes of entity: "Date, Duration, EmailAddress, Facility, GeographicFeature, Hashtag, IPAddress, JobTitle, Location and more ..."<ref>[https://cloud.ibm.com/docs/services/natural-language-understanding?topic=natural-language-understanding-entity-types-version-2&locale=en Entity types (Version 2)]</ref>
* classes of entity: "Date, Duration, EmailAddress, Facility, GeographicFeature, Hashtag, IPAddress, JobTitle, Location and more ..."<ref>[https://cloud.ibm.com/docs/services/natural-language-understanding?topic=natural-language-understanding-entity-types-version-2&locale=en Entity types (Version 2)]</ref>


Line 130: Line 135:
* language support: Traditional Chinese
* language support: Traditional Chinese
* programming language:  
* programming language:  
* Score:
* classes of entity: "person, location, time, measurement and more ... <ref>[https://api.droidtown.co/document/ 卓騰語言科技中文斷詞 API]</ref>"
* classes of entity: "person, location, time, measurement and more ... <ref>[https://api.droidtown.co/document/ 卓騰語言科技中文斷詞 API]</ref>"


Line 137: Line 143:
* language support: simplified Chinese
* language support: simplified Chinese
* programming language: multiple
* programming language: multiple
* Score:
* classes of entity:  
* classes of entity:  
<table border="1" class="wikitable sortable">
<table border="1" class="wikitable sortable">
Line 152: Line 159:
* language support: simplified Chinese
* language support: simplified Chinese
* programming language: multiple
* programming language: multiple
* Score:
* classes of entity: "time, location, person_name, org_name, company_name, product_name and job_title <ref>[http://docs.bosonnlp.com/ner.html 命名实体识别 — BosonNLP HTTP API 1.0 documentation]</ref>"
* classes of entity: "time, location, person_name, org_name, company_name, product_name and job_title <ref>[http://docs.bosonnlp.com/ner.html 命名实体识别 — BosonNLP HTTP API 1.0 documentation]</ref>"



Revision as of 17:15, 31 August 2020

Named entity recognition (NER) 或稱命名實體識別、實體識別、專有名詞辨識

CKIP Neural Chinese Word Segmentation, POS Tagging, and NER

ckiplab/ckiptagger: CKIP Neural Chinese Word Segmentation, POS Tagging, and NER

Class name in EnglishClass name in Traditional Chinese
person人名
norp團體
FAC設施
facility設施*
ORG組織
organization組織*
gpe地理
LOC地點
location地點*
product商品
event事件
WORK藝術品
work of art藝術品*
law法律
language語言
date日期
time時間
percent比例
money
quantity數量
ordinal序數
cardinal數詞
Owl icon.jpg Notes: Asterisk symbol means there are different class name in English but same class name in Chinese.

Stanford CoreNLP

Stanford CoreNLP – Natural language software | Stanford CoreNLP

  • license: GNU General Public License v3 Good.gif
  • language support: English, Chinese ..
  • programming language: Java
  • Score: Available
  • classes of entity: "For English, by default, this annotator recognizes named (PERSON, LOCATION, ORGANIZATION, MISC), numerical (MONEY, NUMBER, ORDINAL, PERCENT), and temporal (DATE, TIME, DURATION, SET) entities (12 classes). [2]"

spaCy

spaCy · Industrial-strength Natural Language Processing in Python

  • license: MIT License Good.gif
  • language support:
  • programming language: Python
  • Score:
  • classes of entity: "PERSON, NORP, FAC, ORG, GPE, LOC, PRODUCT, EVENT, WORK_OF_ART, LAW, LANGUAGE, DATE, TIME, PERCENT, MONEY, QUANTITY, ORDINAL and CARDINAL [3]"

Google Cloud Natural Language

Cloud Natural Language  |  Cloud Natural Language API  |  Google Cloud

  • license:
  • language support: 語言支援  |  Cloud Natural Language API  |  Google Cloud included Traditional Chinese
  • programming language: multiple
  • Score: Available. salience score in the [0, 1.0] range. "The salience score for an entity provides information about the importance or centrality of that entity to the entire document text. Scores closer to 0 are less salient, while scores closer to 1.0 are highly salient.[4]"
  • classes of entity: Details on Entity  |  Cloud Natural Language API  |  Google Cloud -> Type of the entity e.g. "UNKNOWN, PERSON, LOCATION, ORGANIZATION, EVENT, WORK_OF_ART, CONSUMER_GOOD, OTHER, PHONE_NUMBER, ADDRESS, DATE, NUMBER and PRICE"

Amazon Comprehend

Amazon Comprehend – 自然語言處理(NLP) 和機器學習 (ML)

  • license:
  • language support:
  • programming language:
  • Score: Available
  • classes of entity: "COMMERCIAL_ITEM, DATE, EVENT, LOCATION, ORGANIZATION, OTHER, PERSON, QUANTITY and TITLE"[5] as the following:
Type Description Type 中文
COMMERCIAL_ITEM A branded product 商品
DATE A full date (for example, 11/25/2017), day (Tuesday), month (May), or time (8:30 a.m.) 日期
EVENT An event, such as a festival, concert, election, etc. 事件
LOCATION A specific location, such as a country, city, lake, building, etc. 地點
ORGANIZATION Large organizations, such as a government, company, religion, sports team, etc. 機構
OTHER Entities that don't fit into any of the other entity categories 其他
PERSON Individuals, groups of people, nicknames, fictional characters 人名
QUANTITY A quantified amount, such as currency, percentages, numbers, bytes, etc. 量詞
TITLE An official name given to any creation or creative work, such as movies, books, songs, etc. 抬頭

IBM Watson

Watson Natural Language Understanding

  • license:
  • language support:
  • programming language:
  • Score:
  • classes of entity: "Date, Duration, EmailAddress, Facility, GeographicFeature, Hashtag, IPAddress, JobTitle, Location and more ..."[6]

卓騰語言科技中文斷詞

卓騰語言科技中文斷詞 API

  • license:
  • language support: Traditional Chinese
  • programming language:
  • Score:
  • classes of entity: "person, location, time, measurement and more ... [7]"

百度AI开放平台

语言处理基础技术-百度AI开放平台 "专名识别"[8]

  • license:
  • language support: simplified Chinese
  • programming language: multiple
  • Score:
  • classes of entity:
Class name in English (缩略词)Class name in Simplified ChineseClass name in Traditional Chinese
PER人名人名
LOC地名地名
ORG机构名機構名
TIME时间時間


BosonNLP (out of service)

BosonNLP

  • license:
  • language support: simplified Chinese
  • programming language: multiple
  • Score:
  • classes of entity: "time, location, person_name, org_name, company_name, product_name and job_title [9]"
Class name in EnglishClass name in Simplified ChineseClass name in Traditional Chinese
time时间時間
location地点地點
person_name人名人名
org_name组织名組織名
company_name公司名公司名
product_name产品名產品名
job_title职位職位

other NER tools

References