OCR: Difference between revisions

← Older edit

OCR (edit)

Revision as of 20:15, 8 January 2025

1,050 bytes added , 8 January 2025

m

Text replacement - ": Image:Owl icon.jpg " to "{{Tips}} "

Planetoid

Bureaucrats, Administrators

15,049

edits

@@ Line 3: / Line 3: @@
 == OCR tools ==
+=== 圖片轉換成文字 ===
 * {{Gd}} [https://docs.google.com/ Google DOCs]: 上傳文件後，檔案名稱點選右鍵，「選擇開啟工具」 --> 「Google 文件」<ref>[http://docs.google.com/support/bin/answer.py?answer=176692&hl=en Uploading and exporting: Uploading image files with text to Google Docs]、[https://support.google.com/drive/answer/176692?hl=zh-Hant&visit_id=1-636534874969716350-2978233269&rd=1 將 PDF 和相片檔案轉換為文字 - 電腦 - Google 雲端硬碟說明]</ref> 英文可以順利辨識、簡體中文遇到問題。
 ** 教學: [https://buzzorange.com/techorange/2019/12/09/convert-picture-into-word/ 不要浪費時間 key 資料啦！拍照上傳 Google 雲端，按個右鍵就自動幫你轉文字 | TechOrange]
@@ Line 28: / Line 29: @@
 * [https://ocr.space/ Best Free OCR API, Online OCR, Searchable PDF - Fresh 2022 On-Premise OCR Software] 可指定語言
+* [https://chatgpt.com/ ChatGPT] 顯示錯誤訊息「Tesseract 不支援繁體中文」 (Traditional Chinese language data for Tesseract is not available in this environment) {{access | date=2024-08-23}}
+* [https://claude.ai/new Claude] 可處理中文圖轉字，但字出錯仍需要人工校稿。
 * ''$'' [https://cloud.google.com/vision/?hl=zh-tw Vision AI | 透過機器學習技術取得圖片的深入分析結果  |  Cloud Vision API  |  Google Cloud]
-: [[Image:Owl icon.jpg]] 講個秘訣：因為線上服務免費版會限制 PDF 檔案頁數，可使用切割軟體 [[PDF split and merge tools]]
+=== PDF轉換成文字 ===
+{{Tips}} 講個秘訣：因為線上服務免費版會限制 PDF 檔案頁數，可使用切割軟體 [[PDF split and merge tools]]
+* [[Document_converter#PDF.E8.BD.89.E6.8F.9B.E6.88.90.E7.B4.94.E6.96.87.E5.AD.97 | PDF轉換成文字]]
+== OCR scripts & API ==
+[https://github.com/ocropus/ocropy ocropus/ocropy: Python-based tools for document analysis and OCR]
+* Script Language: Python
+* Support Language: < 10. {{exclaim}} 沒有提供中文 model 檔案 {{access | date=2022-04-20}} More on [https://github.com/ocropus-archive/DUP-ocropy/wiki/Models Models · ocropus-archive/DUP-ocropy Wiki]
+* License: [https://github.com/ocropus/ocropy/blob/master/LICENSE Apache License 2.0]
+[https://github.com/tesseract-ocr/tesseract tesseract-ocr/tesseract: Tesseract Open Source OCR Engine (main repository)] {{access | date=2022-06-19}}
+* Script Language: C++; Fork on PHP [https://github.com/thiagoalessio/tesseract-ocr-for-php thiagoalessio/tesseract-ocr-for-php: A wrapper to work with Tesseract OCR inside PHP.] <ref>[https://github.com/tesseract-ocr/tesseract/blob/main/doc/tesseract.1.asc#languages-and-scripts LANGUAGES AND SCRIPTS]</ref>，
+* Support Language: 100+ contains Traditional Chinese 但是繁體中文辨識結果不佳。 {{access | date=2022-04-20}}. More on [https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html Languages/Scripts supported in different versions of Tesseract | tessdoc]
+* License: [https://github.com/tesseract-ocr/tesseract/blob/main/LICENSE Apache License 2.0]. PHP Fork: [https://github.com/thiagoalessio/tesseract-ocr-for-php/blob/main/MIT-LICENSE MIT License]
-== OCR scripts ==
+{{Acronym| acronym=API| def=應用程式介面（英語：application programming interface）}} of OCR services
-Scripts
-* [https://github.com/thiagoalessio/tesseract-ocr-for-php thiagoalessio/tesseract-ocr-for-php: A wrapper to work with Tesseract OCR inside PHP.] 有提供繁體中文 model 檔案({{kbd | key=chi_tra (Chinese traditional)}}) <ref>[https://github.com/tesseract-ocr/tesseract/blob/main/doc/tesseract.1.asc#languages-and-scripts LANGUAGES AND SCRIPTS]</ref>，但是繁體中文辨識結果不佳。 {{access | date=2022-04-20}}
-** Language: PHP
-** License: [https://github.com/thiagoalessio/tesseract-ocr-for-php/blob/main/MIT-LICENSE MIT License]
-* [https://github.com/ocropus/ocropy ocropus/ocropy: Python-based tools for document analysis and OCR] 沒有提供中文 model 檔案 {{access | date=2022-04-20}}
-** Language: Python
-** License: [https://github.com/ocropus/ocropy/blob/master/LICENSE Apache License 2.0]
-*[https://github.com/tesseract-ocr/tesseract tesseract-ocr/tesseract: Tesseract Open Source OCR Engine (main repository)] {{access | date=2022-06-19}}
-** Language: C++:
-** License: [https://github.com/tesseract-ocr/tesseract/blob/main/LICENSE Apache License 2.0]
-== OCR API ==
+Azure AI Vision/[https://azure.microsoft.com/zh-tw/services/cognitive-services/computer-vision/ 電腦視覺 | Microsoft Azure]: [https://docs.microsoft.com/zh-tw/azure/cognitive-services/Computer-vision/quickstarts-sdk/client-library?pivots=programming-language-rest-api&tabs=visual-studio 快速入門：光學字元辨識 (OCR) 用戶端程式庫或 REST API - Azure Cognitive Services | Microsoft Docs]
-OCR API
+* Support Language: 支援中文<ref>[https://docs.microsoft.com/zh-tw/azure/cognitive-services/computer-vision/language-support#optical-character-recognition-ocr 語言支援 - 電腦視覺 - Azure Cognitive Services | Microsoft Docs]</ref>
-* [https://azure.microsoft.com/zh-tw/services/cognitive-services/computer-vision/ 電腦視覺 | Microsoft Azure]: [https://docs.microsoft.com/zh-tw/azure/cognitive-services/Computer-vision/quickstarts-sdk/client-library?pivots=programming-language-rest-api&tabs=visual-studio 快速入門：光學字元辨識 (OCR) 用戶端程式庫或 REST API - Azure Cognitive Services | Microsoft Docs] 支援中文<ref>[https://docs.microsoft.com/zh-tw/azure/cognitive-services/computer-vision/language-support#optical-character-recognition-ocr 語言支援 - 電腦視覺 - Azure Cognitive Services | Microsoft Docs]</ref>
-* [https://cloud.google.com/vision Vision AI | 透過機器學習技術取得圖片的深入分析結果  |  Cloud Vision API  |  Google Cloud]: 支援繁體中文 ({{kbd | key=zh-Hant}})<ref>[https://cloud.google.com/vision/docs/languages OCR Language Support  |  Cloud Vision API  |  Google Cloud]</ref>
+[https://cloud.google.com/vision Vision AI | 透過機器學習技術取得圖片的深入分析結果  |  Cloud Vision API  |  Google Cloud]
-* [https://ocr.space/OCRAPI Free OCR API] 支援繁體中文 ({{kbd | key=cht}})
+* Support Language: 支援繁體中文 ({{kbd | key=zh-Hant}})<ref>[https://cloud.google.com/vision/docs/languages OCR Language Support  |  Cloud Vision API  |  Google Cloud]</ref>
-* [https://aws.amazon.com/tw/rekognition/?blog-cards.sort-by=item.additionalFields.createdDate&blog-cards.sort-order=desc Amazon Rekognition – 影片與影像 – AWS]: [https://docs.aws.amazon.com/rekognition/latest/dg/text-detection.html?pg=ln&sec=ft Detecting text - Amazon Rekognition]: {{exclaim}} 不支援中文<ref>[https://aws.amazon.com/tw/about-aws/whats-new/2021/11/amazon-rekognition-text-detection-7-new-languages-accuracy/ Amazon Rekognition text detection supports 7 new languages and improves accuracy] " Amazon Rekognition is designed to detect words in English, Arabic, Russian, German, French, Italian, Portuguese and Spanish."</ref><ref>[https://docs.aws.amazon.com/rekognition/latest/dg/text-detection.html Detecting text - Amazon Rekognition]</ref> {{access | date=2022-04-20}}
+[https://ocr.space/OCRAPI Free OCR API]
+* Support Language: 支援繁體中文 ({{kbd | key=cht}})
+[https://aws.amazon.com/tw/rekognition/?blog-cards.sort-by=item.additionalFields.createdDate&blog-cards.sort-order=desc Amazon Rekognition – 影片與影像 – AWS]: [https://docs.aws.amazon.com/rekognition/latest/dg/text-detection.html?pg=ln&sec=ft Detecting text - Amazon Rekognition]:
+* Support Language: {{exclaim}} 不支援中文<ref>[https://aws.amazon.com/tw/about-aws/whats-new/2021/11/amazon-rekognition-text-detection-7-new-languages-accuracy/ Amazon Rekognition text detection supports 7 new languages and improves accuracy] " Amazon Rekognition is designed to detect words in English, Arabic, Russian, German, French, Italian, Portuguese and Spanish."</ref><ref>[https://docs.aws.amazon.com/rekognition/latest/dg/text-detection.html Detecting text - Amazon Rekognition]</ref> {{access | date=2022-04-20}}
@@ Line 66: / Line 81: @@
 出處：PCHome 2005/8
+== Related Pages ==
+* [[Document_converter#PDF_%E8%BD%89%E6%8F%9B%E6%88%90%E7%B4%94%E6%96%87%E5%AD%97_(TXT)|Convert PDF to TXT]]
 == References ==

OCR: Difference between revisions

OCR (edit)

Revision as of 20:15, 8 January 2025

Navigation menu

Search