PDF conversion: Difference between revisions

From LemonWiki共筆
Jump to navigation Jump to search
 
(9 intermediate revisions by the same user not shown)
Line 4: Line 4:


==== PDF 轉換成純文字 (TXT) ====
==== PDF 轉換成純文字 (TXT) ====
* {{Gd}} utility: [https://www.xpdfreader.com/pdftotext-man.html pdftotext] on {{Win}}, {{Mac}}<ref>[https://brewinstall.org/install-pdftotext-mac-osx/ Install pdftotext on Mac OSX - Brew Cask | BrewInstall]</ref> & {{Linux}} ([https://en.wikipedia.org/wiki/Pdftotext pdftotext - Wikipedia | Wikipedia]) {{access | date=2023-06-02}}
** Usage: {{kbd | key=<nowiki>pdftotext [options] [PDF-file [text-file]]</nowiki>}}
* {{Gd}} [http://pdftextonline.com/ PDF Text Extraction In Your Browser - PDFTextOnline]線上服務,擷取PDF檔中的文字。(檔案大小不能超過10M) ([http://www.box.net/shared/pyve05ss4c 中文測試ok],測試日期: 2008-12-07。)
* {{Gd}} [http://pdftextonline.com/ PDF Text Extraction In Your Browser - PDFTextOnline]線上服務,擷取PDF檔中的文字。(檔案大小不能超過10M) ([http://www.box.net/shared/pyve05ss4c 中文測試ok],測試日期: 2008-12-07。)


Line 17: Line 14:
* ([[OCR]]) [https://pdfcandy.com/tw/pdf-ocr.html PDF 轉文字–免費線上OCR轉換工具] 免費版限制:一小時轉換一個檔案,檔案大小也有限制{{access | date=2022-10-07}}
* ([[OCR]]) [https://pdfcandy.com/tw/pdf-ocr.html PDF 轉文字–免費線上OCR轉換工具] 免費版限制:一小時轉換一個檔案,檔案大小也有限制{{access | date=2022-10-07}}


[https://www.xpdfreader.com/index.html XpdfReader] [https://www.xpdfreader.com/pdftotext-man.html pdftotext]
[https://www.xpdfreader.com/pdftotext-man.html pdftotext] ([[Convert pdf to txt|Quick Start]])
* License: "The xpdf package is open source, dual licensed under GPL v2 and GPL v3."<ref>[https://www.xpdfreader.com/opensource.html Xpdf Open Source]</ref> {{Gd}}
* On {{Win}}: Part of [https://www.xpdfreader.com/index.html XpdfReader], dual licensed under GPL v2 and GPL v3<ref>[https://www.xpdfreader.com/opensource.html Xpdf Open Source]</ref>
* Usage: {{kbd | key=<nowiki>pdftotext -enc UTF-8 PDF-file text-file</nowiki>}}
* On {{Mac}} & {{Linux}}: Part of [https://poppler.freedesktop.org/ Poppler]<ref>[https://brewinstall.org/install-pdftotext-mac-osx/ Install pdftotext on Mac OSX - Brew Cask | BrewInstall]</ref> (GPL v2 or later), historically derived from Xpdf ([https://en.wikipedia.org/wiki/Pdftotext Wikipedia]) {{access | date=2023-06-02}} {{Gd}}
* Usage: {{kbd | key=<nowiki>pdftotext [options] [PDF-file [text-file]]</nowiki>}} e.g. {{kbd | key=<nowiki>pdftotext -enc UTF-8 example.pdf example.txt</nowiki>}}


[https://github.com/py-pdf/pdfly py-pdf/pdfly: CLI tool to extract (meta)data from PDF and manipulate PDF files] "A {{Acronym| acronym=CLI| def=命令列介面(英語:Command-line interface)}} application that uses [https://github.com/py-pdf/pdfly pypdf] to interact with PDFs."
[https://github.com/py-pdf/pdfly py-pdf/pdfly: CLI tool to extract (meta)data from PDF and manipulate PDF files] "A {{Acronym| acronym=CLI| def=命令列介面(英語:Command-line interface)}} application that uses [https://github.com/py-pdf/pdfly pypdf] to interact with PDFs."
* License: [https://github.com/py-pdf/pdfly/blob/main/LICENSE BSD 3-Clause License] {{Gd}}
* License: [https://github.com/py-pdf/pdfly/blob/main/LICENSE BSD 3-Clause License] {{Gd}}
* Usage: {{kbd | key=<nowiki>pdfly extract-text PDF-file > text-file</nowiki>}}
* Usage: {{kbd | key=<nowiki>pdfly extract-text PDF-file > text-file</nowiki>}}
* Known issues: On Windows systems with Chinese locales (cp950), PDF text extraction may fail with Unicode encoding errors when encountering certain special characters like '\u25aa' (BLACK SMALL SQUARE). This is a character encoding limitation of the default Windows codepage.
* Known issues: On Windows systems with Chinese locales (cp950), PDF text extraction may fail with Unicode encoding errors when encountering certain special characters like '\u25aa' (BLACK SMALL SQUARE). This is a character encoding limitation of the default Windows codepage<ref>[https://stackoverflow.com/questions/50933194/how-do-i-set-the-pythonutf8-environment-variable-to-enable-utf-8-encoding-by-def How do I set the PYTHONUTF8 environment variable to enable UTF-8 encoding by default in Python? - Stack Overflow]</ref>.
* Requirement: [https://www.python.org/downloads/ Python]
* Requirement: [https://www.python.org/downloads/ Python]
[https://github.com/ArtifexSoftware/mupdf mupd] (part of [https://mupdf.com/core MuPDF])
* License: [https://github.com/ArtifexSoftware/mupdf?tab=AGPL-3.0-1-ov-file GNU AFFERO GENERAL PUBLIC LICENSE]
* Usage: {{kbd | key=<nowiki>mutool draw -F txt -o PDF-file text-file</nowiki>}}


==== PDF 轉換成 Word 或 RTF ====
==== PDF 轉換成 Word 或 RTF ====
Line 82: Line 84:
* [https://smallpdf.com/zh-TW/pdf-to-excel PDF轉Excel轉換器 - 免費服務] 免費版每小時只能使用兩次 {{exclaim}} 中文無法正常顯示 {{access | date = 2018-02-01}}
* [https://smallpdf.com/zh-TW/pdf-to-excel PDF轉Excel轉換器 - 免費服務] 免費版每小時只能使用兩次 {{exclaim}} 中文無法正常顯示 {{access | date = 2018-02-01}}


: [[Image:Owl icon.jpg]] 當相同欄位的表格橫跨不同 PDF 頁面,轉成 Excel 檔後,可能會看到表格分散在 Excel 檔案的不同工作表。可使用合併工作表的工具: ''$'' [https://www.ablebits.com/excel-lookup-tables/index.php Merge Excel worksheets by matching data in seconds]
{{Tips}} 當相同欄位的表格橫跨不同 PDF 頁面,轉成 Excel 檔後,可能會看到表格分散在 Excel 檔案的不同工作表。可使用合併工作表的工具: ''$'' [https://www.ablebits.com/excel-lookup-tables/index.php Merge Excel worksheets by matching data in seconds]


==== PDF轉成簡報檔 (PPTX) ====
==== PDF轉成簡報檔 (PPTX) ====
Line 108: Line 110:


* {{Mac}} [https://support.apple.com/zh-tw/HT201740 預覽程式]: 將 PDF 轉換成圖檔,缺點只能轉換一頁。如果選定多頁,只會將選擇範圍的第一頁轉換成圖檔。操作方式: 選單: 檔案 --> 輸出。
* {{Mac}} [https://support.apple.com/zh-tw/HT201740 預覽程式]: 將 PDF 轉換成圖檔,缺點只能轉換一頁。如果選定多頁,只會將選擇範圍的第一頁轉換成圖檔。操作方式: 選單: 檔案 --> 輸出。
==== See also ====
* [[Mutool Glyph Index Error]]
[[Category: PDF Processing]]
[[Category: Document Conversion]]
==== References ====
<references />

Latest revision as of 14:09, 3 February 2026

PDF 轉換

PDF 轉換成純文字 (TXT)[edit]

pdftotext (Quick Start)

  • On Win Os windows.png : Part of XpdfReader, dual licensed under GPL v2 and GPL v3[2]
  • On macOS icon_os_mac.png & Linux Os linux.png : Part of Poppler[3] (GPL v2 or later), historically derived from Xpdf (Wikipedia) [Last visited: 2023-06-02] Good.gif
  • Usage: pdftotext [options] [PDF-file [text-file]] e.g. pdftotext -enc UTF-8 example.pdf example.txt

py-pdf/pdfly: CLI tool to extract (meta)data from PDF and manipulate PDF files "A CLI application that uses pypdf to interact with PDFs."

  • License: BSD 3-Clause License Good.gif
  • Usage: pdfly extract-text PDF-file > text-file
  • Known issues: On Windows systems with Chinese locales (cp950), PDF text extraction may fail with Unicode encoding errors when encountering certain special characters like '\u25aa' (BLACK SMALL SQUARE). This is a character encoding limitation of the default Windows codepage[4].
  • Requirement: Python

mupd (part of MuPDF)

PDF 轉換成 Word 或 RTF[edit]

  • Good.gif Microsoft OneDrive PDF 轉 Word 操作步驟:使用 Word Online 編輯 PDF 檔,將會轉換 PDF 為 Word 檔案。再下載複本到電腦。 (中文測試ok,測試日期: 2013-09-27。)
  • Good.gif Google 雲端硬碟 PDF 轉 Word 操作步驟:(1) 上傳 PDF 檔案到 Google 雲端硬碟、(2) 不要點兩下,直接打開 PDF 檔案。而是按滑鼠右鍵,「選擇開啟工具」,再選擇「Google 文件」。(測試日期: 2023-01-31)
  • $ Adobe PDF Services PDF 轉成 word, excel, OCR 辨識,但是台灣不支援該服務。[Last visited: 2018-02-06]

相關頁面: OCR 日文 PDF 轉 Word 或 Excel 的 線上轉檔服務比較

PDF 轉換成網頁 (HTM)[edit]

PDF 轉換成 Excel[edit]

Owl icon.jpg 當相同欄位的表格橫跨不同 PDF 頁面,轉成 Excel 檔後,可能會看到表格分散在 Excel 檔案的不同工作表。可使用合併工作表的工具: $ Merge Excel worksheets by matching data in seconds

PDF轉成簡報檔 (PPTX)[edit]

PDF 轉換成圖檔[edit]

  • Adobe Reader X: (1)選單 -> 編輯: 勾選「拍攝快照」。 (2)回到PDF本文,選取要複製的圖片; (3)貼到圖形編輯軟體。
  • macOS icon_os_mac.png 預覽程式: 將 PDF 轉換成圖檔,缺點只能轉換一頁。如果選定多頁,只會將選擇範圍的第一頁轉換成圖檔。操作方式: 選單: 檔案 --> 輸出。

See also[edit]

References[edit]