Named entity recognition tools
Jump to navigation
Jump to search
Named entity recognition (NER) 或稱命名實體識別、實體識別、專有名詞辨識
CKIP Neural Chinese Word Segmentation, POS Tagging, and NER
ckiplab/ckiptagger: CKIP Neural Chinese Word Segmentation, POS Tagging, and NER
- license: GNU General Public License v3.0
- language support: Traditional Chinese
- programming language: Python
- Score:
- classes of entity[1]
Class name in English | Class name in Traditional Chinese |
---|---|
person | 人名 |
norp | 團體 |
FAC | 設施 |
facility | 設施* |
ORG | 組織 |
organization | 組織* |
gpe | 地理 |
LOC | 地點 |
location | 地點* |
product | 商品 |
event | 事件 |
WORK | 藝術品 |
work of art | 藝術品* |
law | 法律 |
language | 語言 |
date | 日期 |
time | 時間 |
percent | 比例 |
money | 錢 |
quantity | 數量 |
ordinal | 序數 |
cardinal | 數詞 |
- Notes: Asterisk symbol means there are different class name in English but same class name in Chinese.
Stanford CoreNLP
Stanford CoreNLP – Natural language software | Stanford CoreNLP
- license: GNU General Public License v3
- language support: English, Chinese ..
- programming language: Java
- Score: Available
- classes of entity: "For English, by default, this annotator recognizes named (PERSON, LOCATION, ORGANIZATION, MISC), numerical (MONEY, NUMBER, ORDINAL, PERCENT), and temporal (DATE, TIME, DURATION, SET) entities (12 classes). [2]"
spaCy
spaCy · Industrial-strength Natural Language Processing in Python
- license: MIT License
- language support:
- programming language: Python
- Score:
- classes of entity: "PERSON, NORP, FAC, ORG, GPE, LOC, PRODUCT, EVENT, WORK_OF_ART, LAW, LANGUAGE, DATE, TIME, PERCENT, MONEY, QUANTITY, ORDINAL and CARDINAL [3]"
Google Cloud Natural Language
Cloud Natural Language | Cloud Natural Language API | Google Cloud
- license:
- language support: 語言支援 | Cloud Natural Language API | Google Cloud included Traditional Chinese
- programming language: multiple
- Score: Available. salience score in the [0, 1.0] range. "The salience score for an entity provides information about the importance or centrality of that entity to the entire document text. Scores closer to 0 are less salient, while scores closer to 1.0 are highly salient.[4]"
- classes of entity: Details on Entity | Cloud Natural Language API | Google Cloud -> Type of the entity e.g. "UNKNOWN, PERSON, LOCATION, ORGANIZATION, EVENT, WORK_OF_ART, CONSUMER_GOOD, OTHER, PHONE_NUMBER, ADDRESS, DATE, NUMBER and PRICE"
Amazon Comprehend
Amazon Comprehend – 自然語言處理(NLP) 和機器學習 (ML)
- license:
- language support:
- programming language:
- Score: Available. "Each entity also has a score that indicates the level of confidence that Amazon Comprehend has that it correctly detected the entity type. You can filter out the entities with lower scores to reduce the risk of using incorrect detections.[5]"
- classes of entity: "COMMERCIAL_ITEM, DATE, EVENT, LOCATION, ORGANIZATION, OTHER, PERSON, QUANTITY and TITLE"[6] as the following:
Type | Description | Type 中文 |
---|---|---|
COMMERCIAL_ITEM | A branded product | 商品 |
DATE | A full date (for example, 11/25/2017), day (Tuesday), month (May), or time (8:30 a.m.) | 日期 |
EVENT | An event, such as a festival, concert, election, etc. | 事件 |
LOCATION | A specific location, such as a country, city, lake, building, etc. | 地點 |
ORGANIZATION | Large organizations, such as a government, company, religion, sports team, etc. | 機構 |
OTHER | Entities that don't fit into any of the other entity categories | 其他 |
PERSON | Individuals, groups of people, nicknames, fictional characters | 人名 |
QUANTITY | A quantified amount, such as currency, percentages, numbers, bytes, etc. | 量詞 |
TITLE | An official name given to any creation or creative work, such as movies, books, songs, etc. | 抬頭 |
IBM Watson
Watson Natural Language Understanding
- license:
- language support:
- programming language:
- Score:
- classes of entity: "Date, Duration, EmailAddress, Facility, GeographicFeature, Hashtag, IPAddress, JobTitle, Location and more ..."[7]
卓騰語言科技中文斷詞
- license:
- language support: Traditional Chinese
- programming language:
- Score:
- classes of entity: "person, location, time, measurement and more ... [8]"
百度AI开放平台
语言处理基础技术-百度AI开放平台 "专名识别"[9] / baidu/lac: 百度NLP:分词,词性标注,命名实体识别
- license:
- language support: simplified Chinese
- programming language: multiple
- Score:
- classes of entity:
Class name in English (缩略词) | Class name in Simplified Chinese | Class name in Traditional Chinese |
---|---|---|
PER | 人名 | 人名 |
LOC | 地名 | 地名 |
ORG | 机构名 | 機構名 |
TIME | 时间 | 時間 |
BosonNLP (out of service)
- license:
- language support: simplified Chinese
- programming language: multiple
- Score:
- classes of entity: "time, location, person_name, org_name, company_name, product_name and job_title [10]"
Class name in English | Class name in Simplified Chinese | Class name in Traditional Chinese |
---|---|---|
time | 时间 | 時間 |
location | 地点 | 地點 |
person_name | 人名 | 人名 |
org_name | 组织名 | 組織名 |
company_name | 公司名 | 公司名 |
product_name | 产品名 | 產品名 |
job_title | 职位 | 職位 |
Microsoft Azure Cognitive Services
搭配文字分析 API 使用實體辨識 - Azure Cognitive Services | Microsoft Docs / Use entity recognition with the Text Analytics API - Azure Cognitive Services | Microsoft Docs
- license
- language support: English & Chinese. See eetails on Language support - Text Analytics API - Azure Cognitive Services | Microsoft Docs.
- programming language: The language if supports sending a REST API request. See details on Use entity recognition with the Text Analytics API - Azure Cognitive Services | Microsoft Docs
- Score: Available.
- classes of entity: Person, PersonType, Location, Organization, Event, Product and more. See details on Supported Categories for Named Entity Recognition - Azure Cognitive Services | Microsoft Docs.
other similar NER tools
- Article Extraction API Documentation - Diffbot "Array of tags/entities, generated from analysis of the extracted text and cross-referenced with DBpedia and other data sources. Language-specific tags will be returned if the source text is in English, Chinese, French, German, Spanish or Russian."
- Wolfram|Alpha APIs: Computational Knowledge Integration "Wolfram|Alpha makes numerous assumptions when analyzing a query and deciding how to present its results. A simple example is a word that can refer to multiple things, like "pi", which is a well-known mathematical constant but is also the name of a movie." [1]
References
- ↑ 中文專有名詞辨識系統 簡報
- ↑ Named Entity Recognition – NERClassifierCombiner | Stanford CoreNLP
- ↑ Annotation Specifications · spaCy API Documentation
- ↑ Entity | Cloud Natural Language API | Google Cloud
- ↑ Detect Entities - Amazon Comprehend
- ↑ Detect Entities - Amazon Comprehend
- ↑ Entity types (Version 2)
- ↑ 卓騰語言科技中文斷詞 API
- ↑ 词法分析接口
- ↑ 命名实体识别 — BosonNLP HTTP API 1.0 documentation