Install pdftotext on windows
pdftotext Installation Troubleshooting Guide
Installation[edit]
For Windows users:
- Download the "Xpdf command line tools" from https://www.xpdfreader.com/download.html
- Extract the "Xpdf command line tools" to X:\xpdf
- Add X:\xpdf\bin64 to your Windows PATH
Verify the installation
- Open a new CMD window
- Enter the command "where pdftotext"
- Expected result: "X:\xpdf\bin64\pdftotext.exe"
- Enter the command "pdftotext -v"
- Expected result: "pdftotext version 4.05 [www.xpdfreader.com] ..."
Problem Description[edit]
When installing the Python package with `pip install pdftotext`, dependency installation fails, resulting in a broken `pdftotext` script being installed.
Symptoms[edit]
Running `pdftotext` shows the error:
'C:\Program' is not recognized as an internal or external command, operable program or batch file. Tesseract Not Found! Please Install it ... Imagemagick Not Found! Please install it...
Root Cause Analysis[edit]
- `pip install pdftotext` installs a Python wrapper package, not the actual Xpdf tool.
- This package depends on ImageMagick, Ghostscript, and Tesseract, but the dependency installation failed.
- The generated `C:\Python310\Scripts\pdftotext.exe` is an invalid script.
- It’s recommended to follow the installation steps above and use the "Xpdf command line tools" located at `X:\xpdf\bin64\pdftotext.exe`.
Resolution Steps[edit]
Step 1: Check the current pdftotext location
where pdftotext
Output:
C:\Python310\Scripts\pdftotext.exe
Step 2: Try uninstalling the Python package
pip uninstall pdftotext
Output:
WARNING: Skipping pdftotext as it is not installed.
This indicates the package itself wasn’t correctly installed, but the executable remains.
Step 3: Manually delete the incorrect executable
del C:\Python310\Scripts\pdftotext.exe
Step 4: Confirm it was removed
where pdftotext
Output:
INFO: Could not find files for the given pattern(s).
Step 5: Add the real Xpdf tool to PATH
Method A: GUI (Recommended)
- Press **Win + R**, type `sysdm.cpl`, and press Enter
- Click the **Advanced** tab
- Click **Environment Variables**
- Under “System variables,” find **Path**
- Click **Edit**
- Click **New**
- Enter: `X:\xpdf\bin64`
- Click **OK** to close all dialogs
Method B: PowerShell (Run as Administrator)
[Environment]::SetEnvironmentVariable("Path", $env:Path + ";X:\xpdf\bin64", "Machine")
Step 6: Reload the PATH environment variable
Option 1: Use refreshenv (fastest)
refreshenv
Output:
Refreshing environment variables from registry for cmd.exe. Please wait...Finished..
Option 2: Reopen Command Prompt
Close the current CMD window and open a new one.
Step 7: Verify the configuration
where pdftotext
Expected output:
X:\xpdf\bin64\pdftotext.exe
Step 8: Test the tool
pdftotext
Expected output:
pdftotext version 4.05 [www.xpdfreader.com] Copyright 1996-2024 Glyph & Cog, LLC Usage: pdftotext [options] <PDF-file> [<text-file>]