Editing
OCR
Jump to navigation
Jump to search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
OCR (optical character recognition), [https://zh.wikipedia.org/wiki/%E5%85%89%E5%AD%A6%E5%AD%97%E7%AC%A6%E8%AF%86%E5%88%AB 光學字元辨識]、圖片轉文字 == OCR tools == === 圖片轉換成文字 === * {{Gd}} [https://docs.google.com/ Google DOCs]: 上傳文件後,檔案名稱點選右鍵,「選擇開啟工具」 --> 「Google 文件」<ref>[http://docs.google.com/support/bin/answer.py?answer=176692&hl=en Uploading and exporting: Uploading image files with text to Google Docs]、[https://support.google.com/drive/answer/176692?hl=zh-Hant&visit_id=1-636534874969716350-2978233269&rd=1 將 PDF 和相片檔案轉換為文字 - 電腦 - Google 雲端硬碟說明]</ref> 英文可以順利辨識、簡體中文遇到問題。 ** 教學: [https://buzzorange.com/techorange/2019/12/09/convert-picture-into-word/ 不要浪費時間 key 資料啦!拍照上傳 Google 雲端,按個右鍵就自動幫你轉文字 | TechOrange] * [https://www.google.com/photos/about/ Google Photos] 將圖片上傳到 Google Photos,再點選「複製圖像中的文字」{{access | date=2022-09-30}} * {{Gd}} [https://line.me/zh-hant/ 免費通話、免費傳訊的應用程式「LINE」] ** [https://mrmad.com.tw/line-ocr 【教學】LINE 透過 OCR 文字辨識功能,直接讓圖片轉成文字技巧 - 瘋先生] * [https://www.onlineocr.net/dashboard Free Online OCR - convert scanned PDF and images to Word, JPEG to Word] 不註冊有前 5 頁的額度、註冊會員有總共 50 頁的免費額度 {{access | date=2018-02-06}} * [https://udn.com/news/story/7088/3326897 Google Keep內建辨識功能 將圖片內容轉文字輸出 | 社群網路 | 數位 | 聯合新聞網] {{access | date=2018-08-27}} * [https://zhtw.109876543210.com/ 免費在線OCR - 在線圖片識別 - 免費OCR軟件 - 免費OCR轉換成Word - 在線文字識別轉換 - 圖片文字識別軟件 - 圖片轉文字] 免費版限制每次可以批次上傳 3 頁、每天轉換 10 頁的 PDF 檔。 {{access | date=2018-10-24}} * MS Office 2003 需額外安裝的Office 工具: Microsoft Office Document Imaging ([http://fun.idv.tw/fun/2008/12/ms_office_ocr.html 你也可以輕鬆做文字辨識(OCR)]) *# (.pdf檔案轉為.mdi) PDF列印到 MS Office 2003 Document Imaging *# (.mdi檔案轉為word檔) MS Office 2003 Document Imaging(.mdi) -> 使用OCR辨識/傳送文字到Word * [http://www.microsoft.com/downloads/details.aspx?FamilyID=dd172063-9517-41d8-82af-29c38f7437b6&DisplayLang=zh-tw Microsoft Office Document Imaging 中文簡體OCR辨識引擎] * [http://imageconvert.nsspot.net/?m=img2ocr Optical Character Recognition tool that extracts text from major image format - Online Image, PDF, Latex, OCR Converter] 繁體中文辨識結果不佳。{{access | date=2020-10-29}} * ''$'' [http://www.newocr.com/ Free Online OCR - Convert JPEG, PNG, GIF, BMP, TIFF, PDF, DjVu to Text] 可指定語言。線上網頁轉換,需要逐頁下載轉換後的檔案。使用 API 免費額度 20 頁<ref>[http://www.newocr.com/api/ Free Online OCR - OCR API]</ref> * [https://ocr.space/ Best Free OCR API, Online OCR, Searchable PDF - Fresh 2022 On-Premise OCR Software] 可指定語言 * [https://chatgpt.com/ ChatGPT] 顯示錯誤訊息「Tesseract 不支援繁體中文」 (Traditional Chinese language data for Tesseract is not available in this environment) {{access | date=2024-08-23}} * [https://claude.ai/new Claude] 可處理中文圖轉字,但字出錯仍需要人工校稿。 * ''$'' [https://cloud.google.com/vision/?hl=zh-tw Vision AI | 透過機器學習技術取得圖片的深入分析結果 | Cloud Vision API | Google Cloud] === PDF轉換成文字 === {{Tips}} 講個秘訣:因為線上服務免費版會限制 PDF 檔案頁數,可使用切割軟體 [[PDF split and merge tools]] * [[Document_converter#PDF.E8.BD.89.E6.8F.9B.E6.88.90.E7.B4.94.E6.96.87.E5.AD.97 | PDF轉換成文字]] == OCR scripts & API == [https://github.com/ocropus/ocropy ocropus/ocropy: Python-based tools for document analysis and OCR] * Script Language: Python * Support Language: < 10. {{exclaim}} 沒有提供中文 model 檔案 {{access | date=2022-04-20}} More on [https://github.com/ocropus-archive/DUP-ocropy/wiki/Models Models · ocropus-archive/DUP-ocropy Wiki] * License: [https://github.com/ocropus/ocropy/blob/master/LICENSE Apache License 2.0] [https://github.com/tesseract-ocr/tesseract tesseract-ocr/tesseract: Tesseract Open Source OCR Engine (main repository)] {{access | date=2022-06-19}} * Script Language: C++; Fork on PHP [https://github.com/thiagoalessio/tesseract-ocr-for-php thiagoalessio/tesseract-ocr-for-php: A wrapper to work with Tesseract OCR inside PHP.] <ref>[https://github.com/tesseract-ocr/tesseract/blob/main/doc/tesseract.1.asc#languages-and-scripts LANGUAGES AND SCRIPTS]</ref>, * Support Language: 100+ contains Traditional Chinese 但是繁體中文辨識結果不佳。 {{access | date=2022-04-20}}. More on [https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html Languages/Scripts supported in different versions of Tesseract | tessdoc] * License: [https://github.com/tesseract-ocr/tesseract/blob/main/LICENSE Apache License 2.0]. PHP Fork: [https://github.com/thiagoalessio/tesseract-ocr-for-php/blob/main/MIT-LICENSE MIT License] {{Acronym| acronym=API| def=應用程式介面(英語:application programming interface)}} of OCR services Azure AI Vision/[https://azure.microsoft.com/zh-tw/services/cognitive-services/computer-vision/ 電腦視覺 | Microsoft Azure]: [https://docs.microsoft.com/zh-tw/azure/cognitive-services/Computer-vision/quickstarts-sdk/client-library?pivots=programming-language-rest-api&tabs=visual-studio 快速入門:光學字元辨識 (OCR) 用戶端程式庫或 REST API - Azure Cognitive Services | Microsoft Docs] * Support Language: 支援中文<ref>[https://docs.microsoft.com/zh-tw/azure/cognitive-services/computer-vision/language-support#optical-character-recognition-ocr 語言支援 - 電腦視覺 - Azure Cognitive Services | Microsoft Docs]</ref> [https://cloud.google.com/vision Vision AI | 透過機器學習技術取得圖片的深入分析結果 | Cloud Vision API | Google Cloud] * Support Language: 支援繁體中文 ({{kbd | key=zh-Hant}})<ref>[https://cloud.google.com/vision/docs/languages OCR Language Support | Cloud Vision API | Google Cloud]</ref> [https://ocr.space/OCRAPI Free OCR API] * Support Language: 支援繁體中文 ({{kbd | key=cht}}) [https://aws.amazon.com/tw/rekognition/?blog-cards.sort-by=item.additionalFields.createdDate&blog-cards.sort-order=desc Amazon Rekognition – 影片與影像 – AWS]: [https://docs.aws.amazon.com/rekognition/latest/dg/text-detection.html?pg=ln&sec=ft Detecting text - Amazon Rekognition]: * Support Language: {{exclaim}} 不支援中文<ref>[https://aws.amazon.com/tw/about-aws/whats-new/2021/11/amazon-rekognition-text-detection-7-new-languages-accuracy/ Amazon Rekognition text detection supports 7 new languages and improves accuracy] " Amazon Rekognition is designed to detect words in English, Arabic, Russian, German, French, Italian, Portuguese and Spanish."</ref><ref>[https://docs.aws.amazon.com/rekognition/latest/dg/text-detection.html Detecting text - Amazon Rekognition]</ref> {{access | date=2022-04-20}} 相關頁面 * [[Troubleshooting of Amazon Rekognition]] * [[Troubleshooting of Azure Cognitive Services API]] == 常用文件的解析度設定 == 常用用途的解析度設定 * 文字辨識 75~150 dpi * 圖文交雜 100~150 dpi * 圖檔(螢幕上觀看) 150~250 dpi {{exclaim}} 個人經驗: 簡報掃描的圖檔,如果是小字 300 dpi 可以辨識,但建議調整到 600 dpi。 * 圖檔(有列印需求) 300 dpi以上 * 名片 150~200 dpi 出處:PCHome 2005/8 == Related Pages == * [[Document_converter#PDF_%E8%BD%89%E6%8F%9B%E6%88%90%E7%B4%94%E6%96%87%E5%AD%97_(TXT)|Convert PDF to TXT]] == References == <references /> 相關文章 * [https://www.ptt.cc/bbs/EZsoft/M.1516336833.A.EAF.html (請問) 有沒有可以大量OCR(圖文轉換)的軟體 - 看板 EZsoft - 批踢踢實業坊] {{access | date=2018-02-15}} [[Category:Tool]]
Summary:
Please note that all contributions to LemonWiki共筆 are considered to be released under the Creative Commons Attribution-NonCommercial-ShareAlike (see
LemonWiki:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Templates used on this page:
Template:Access
(
view source
) (protected)
Template:Acronym
(
edit
)
Template:Exclaim
(
edit
)
Template:Gd
(
edit
)
Template:Kbd
(
edit
)
Template:Tips
(
edit
)
Navigation menu
Personal tools
Not logged in
Talk
Contributions
Log in
Namespaces
Page
Discussion
English
Views
Read
Edit
View history
More
Search
Navigation
Main page
Current events
Recent changes
Random page
Help
Categories
Tools
What links here
Related changes
Special pages
Page information