How to batch convert PDFs to Non OCR Pdfs

 I had a niche case where we were providing documents to an entity and wanted to ensure they were non ocr. For this we use Ghostscript and powershell to make this happen.

Key takes: You need to ensure your path to ghostscript is set. Mine is C:\Program Files\gs\gs10.05.1\bin\gswin64c.exe.

Ghostscript download: https://ghostscript.com/releases/gsdnld.html 

The script will prompt you for the input folder and the destination folder as well. Save it in notepad as a .ps1 file to the destination of your choice.  Run the script and enjoy!

Code:


Add-Type -AssemblyName System.Windows.Forms

# Prompt for source folder
$sourceDialog = New-Object System.Windows.Forms.FolderBrowserDialog
$sourceDialog.Description = "Select the folder with original PDFs"
$null = $sourceDialog.ShowDialog()
$sourceFolder = $sourceDialog.SelectedPath

# Prompt for destination folder
$destDialog = New-Object System.Windows.Forms.FolderBrowserDialog
$destDialog.Description = "Select the folder to save image-only PDFs"
$null = $destDialog.ShowDialog()
$destinationFolder = $destDialog.SelectedPath

if (-not (Test-Path $sourceFolder) -or -not (Test-Path $destinationFolder)) {
Write-Host "Invalid folder paths. Exiting." -ForegroundColor Red
exit
}

# Path to Ghostscript (adjust if needed)
$gsPath = "C:\Program Files\gs\gs10.05.1\bin\gswin64c.exe"
if (-not (Test-Path $gsPath)) {
Write-Host "Ghostscript not found at: $gsPath" -ForegroundColor Red
exit
}

# Get all PDFs
$pdfs = Get-ChildItem -Path $sourceFolder -Filter *.pdf
$total = $pdfs.Count
$index = 0

foreach ($pdf in $pdfs) {
$index++
$inputFile = $pdf.FullName
$outputFile = Join-Path $destinationFolder $pdf.Name

$quotedInput = "`"$inputFile`""
$quotedOutput = "`"$outputFile`""

$progressPercent = [math]::Round(($index / $total) * 100, 0)
$statusMessage = "$index of $total - $($pdf.Name)"
Write-Progress -Activity "Flattening (image-only)" -Status $statusMessage -PercentComplete $progressPercent

$arguments = @(
"-dNOPAUSE"
"-dBATCH"
"-dSAFER"
"-sDEVICE=pdfimage24"
"-dCompatibilityLevel=1.4"
"-r300"
"-sOutputFile=$quotedOutput"
$quotedInput
)

$process = Start-Process -FilePath $gsPath -ArgumentList $arguments -NoNewWindow -Wait -PassThru

if ($process.ExitCode -eq 0) {
Write-Host "✅ Image-only PDF created: $($pdf.Name)"
} else {
Write-Host "❌ Failed to flatten: $($pdf.Name)" -ForegroundColor Red
}
}

Write-Progress -Activity "Flattening (image-only)" -Completed -Status "All files processed."
Write-Host "`n✅ All PDFs rasterized and saved to: $destinationFolder" -ForegroundColor Green



Comments

Popular posts from this blog

Automating white noise with Home Assistant

Uninstalling Old Office and Lync in one swoop

Old Bike Pics from the first wreck.