14,974
edits
m (Text replacement - ": Image:Owl icon.jpg " to "{{Tips}} ") |
m (→= See also) |
||
| (7 intermediate revisions by the same user not shown) | |||
| Line 4: | Line 4: | ||
==== PDF 轉換成純文字 (TXT) ==== | ==== PDF 轉換成純文字 (TXT) ==== | ||
* {{Gd}} [http://pdftextonline.com/ PDF Text Extraction In Your Browser - PDFTextOnline]線上服務,擷取PDF檔中的文字。(檔案大小不能超過10M) ([http://www.box.net/shared/pyve05ss4c 中文測試ok],測試日期: 2008-12-07。) | * {{Gd}} [http://pdftextonline.com/ PDF Text Extraction In Your Browser - PDFTextOnline]線上服務,擷取PDF檔中的文字。(檔案大小不能超過10M) ([http://www.box.net/shared/pyve05ss4c 中文測試ok],測試日期: 2008-12-07。) | ||
| Line 17: | Line 14: | ||
* ([[OCR]]) [https://pdfcandy.com/tw/pdf-ocr.html PDF 轉文字–免費線上OCR轉換工具] 免費版限制:一小時轉換一個檔案,檔案大小也有限制{{access | date=2022-10-07}} | * ([[OCR]]) [https://pdfcandy.com/tw/pdf-ocr.html PDF 轉文字–免費線上OCR轉換工具] 免費版限制:一小時轉換一個檔案,檔案大小也有限制{{access | date=2022-10-07}} | ||
[https://www.xpdfreader.com/ | [https://www.xpdfreader.com/pdftotext-man.html pdftotext] ([[Convert pdf to txt|Quick Start]]) | ||
* On {{Win}}: Part of [https://www.xpdfreader.com/index.html XpdfReader], dual licensed under GPL v2 and GPL v3<ref>[https://www.xpdfreader.com/opensource.html Xpdf Open Source]</ref> | |||
* Usage: {{kbd | key=<nowiki>pdftotext | * On {{Mac}} & {{Linux}}: Part of [https://poppler.freedesktop.org/ Poppler]<ref>[https://brewinstall.org/install-pdftotext-mac-osx/ Install pdftotext on Mac OSX - Brew Cask | BrewInstall]</ref> (GPL v2 or later), historically derived from Xpdf ([https://en.wikipedia.org/wiki/Pdftotext Wikipedia]) {{access | date=2023-06-02}} {{Gd}} | ||
* Usage: {{kbd | key=<nowiki>pdftotext [options] [PDF-file [text-file]]</nowiki>}} e.g. {{kbd | key=<nowiki>pdftotext -enc UTF-8 example.pdf example.txt</nowiki>}} | |||
[https://github.com/py-pdf/pdfly py-pdf/pdfly: CLI tool to extract (meta)data from PDF and manipulate PDF files] "A {{Acronym| acronym=CLI| def=命令列介面(英語:Command-line interface)}} application that uses [https://github.com/py-pdf/pdfly pypdf] to interact with PDFs." | [https://github.com/py-pdf/pdfly py-pdf/pdfly: CLI tool to extract (meta)data from PDF and manipulate PDF files] "A {{Acronym| acronym=CLI| def=命令列介面(英語:Command-line interface)}} application that uses [https://github.com/py-pdf/pdfly pypdf] to interact with PDFs." | ||
| Line 26: | Line 24: | ||
* Known issues: On Windows systems with Chinese locales (cp950), PDF text extraction may fail with Unicode encoding errors when encountering certain special characters like '\u25aa' (BLACK SMALL SQUARE). This is a character encoding limitation of the default Windows codepage<ref>[https://stackoverflow.com/questions/50933194/how-do-i-set-the-pythonutf8-environment-variable-to-enable-utf-8-encoding-by-def How do I set the PYTHONUTF8 environment variable to enable UTF-8 encoding by default in Python? - Stack Overflow]</ref>. | * Known issues: On Windows systems with Chinese locales (cp950), PDF text extraction may fail with Unicode encoding errors when encountering certain special characters like '\u25aa' (BLACK SMALL SQUARE). This is a character encoding limitation of the default Windows codepage<ref>[https://stackoverflow.com/questions/50933194/how-do-i-set-the-pythonutf8-environment-variable-to-enable-utf-8-encoding-by-def How do I set the PYTHONUTF8 environment variable to enable UTF-8 encoding by default in Python? - Stack Overflow]</ref>. | ||
* Requirement: [https://www.python.org/downloads/ Python] | * Requirement: [https://www.python.org/downloads/ Python] | ||
[https://github.com/ArtifexSoftware/mupdf mupd] (part of [https://mupdf.com/core MuPDF]) | |||
* License: [https://github.com/ArtifexSoftware/mupdf?tab=AGPL-3.0-1-ov-file GNU AFFERO GENERAL PUBLIC LICENSE] | |||
* Usage: {{kbd | key=<nowiki>mutool draw -F txt -o PDF-file text-file</nowiki>}} | |||
==== PDF 轉換成 Word 或 RTF ==== | ==== PDF 轉換成 Word 或 RTF ==== | ||
| Line 108: | Line 110: | ||
* {{Mac}} [https://support.apple.com/zh-tw/HT201740 預覽程式]: 將 PDF 轉換成圖檔,缺點只能轉換一頁。如果選定多頁,只會將選擇範圍的第一頁轉換成圖檔。操作方式: 選單: 檔案 --> 輸出。 | * {{Mac}} [https://support.apple.com/zh-tw/HT201740 預覽程式]: 將 PDF 轉換成圖檔,缺點只能轉換一頁。如果選定多頁,只會將選擇範圍的第一頁轉換成圖檔。操作方式: 選單: 檔案 --> 輸出。 | ||
==== See also ==== | |||
* [[Mutool Glyph Index Error]] | |||
[[Category: PDF Processing]] | |||
[[Category: Document Conversion]] | |||
==== References ==== | |||
<references /> | |||