Install pdftotext on windows

From LemonWiki共筆
Jump to navigation Jump to search

pdftotext Installation Troubleshooting Guide

Installation[edit]

For Windows users:

  1. Download the "Xpdf command line tools" from https://www.xpdfreader.com/download.html
  2. Extract the "Xpdf command line tools" to X:\xpdf
  3. Add X:\xpdf\bin64 to your Windows PATH

Verify the installation

  1. Open a new CMD window
  2. Enter the command "where pdftotext"
  3. Expected result: "X:\xpdf\bin64\pdftotext.exe"
  4. Enter the command "pdftotext -v"
  5. Expected result: "pdftotext version 4.05 [www.xpdfreader.com] ..."


Problem Description[edit]

When installing the Python package with `pip install pdftotext`, dependency installation fails, resulting in a broken `pdftotext` script being installed.

Symptoms[edit]

Running `pdftotext` shows the error:

'C:\Program' is not recognized as an internal or external command, operable program or batch file.
Tesseract Not Found! Please Install it ...
Imagemagick Not Found!
Please install it...

Root Cause Analysis[edit]

  1. `pip install pdftotext` installs a Python wrapper package, not the actual Xpdf tool.
  2. This package depends on ImageMagick, Ghostscript, and Tesseract, but the dependency installation failed.
  3. The generated `C:\Python310\Scripts\pdftotext.exe` is an invalid script.
  4. It’s recommended to follow the installation steps above and use the "Xpdf command line tools" located at `X:\xpdf\bin64\pdftotext.exe`.

Resolution Steps[edit]

Step 1: Check the current pdftotext location

where pdftotext

Output:

C:\Python310\Scripts\pdftotext.exe

Step 2: Try uninstalling the Python package

pip uninstall pdftotext

Output:

WARNING: Skipping pdftotext as it is not installed.

This indicates the package itself wasn’t correctly installed, but the executable remains.

Step 3: Manually delete the incorrect executable

del C:\Python310\Scripts\pdftotext.exe

Step 4: Confirm it was removed

where pdftotext

Output:

INFO: Could not find files for the given pattern(s).

Step 5: Add the real Xpdf tool to PATH

Method A: GUI (Recommended)

  1. Press **Win + R**, type `sysdm.cpl`, and press Enter
  2. Click the **Advanced** tab
  3. Click **Environment Variables**
  4. Under “System variables,” find **Path**
  5. Click **Edit**
  6. Click **New**
  7. Enter: `X:\xpdf\bin64`
  8. Click **OK** to close all dialogs

Method B: PowerShell (Run as Administrator)

[Environment]::SetEnvironmentVariable("Path", $env:Path + ";X:\xpdf\bin64", "Machine")

Step 6: Reload the PATH environment variable

Option 1: Use refreshenv (fastest)

refreshenv

Output:

Refreshing environment variables from registry for cmd.exe. Please wait...Finished..

Option 2: Reopen Command Prompt

Close the current CMD window and open a new one.

Step 7: Verify the configuration

where pdftotext

Expected output:

X:\xpdf\bin64\pdftotext.exe

Step 8: Test the tool

pdftotext

Expected output:

pdftotext version 4.05 [www.xpdfreader.com]
Copyright 1996-2024 Glyph & Cog, LLC
Usage: pdftotext [options] <PDF-file> [<text-file>]