Forrest logo
back to the ocrmypdf tool

ocrmypdf:tldr:7f231

ocrmypdf: Skip pages of a mixed-format input PDF file that already contain text.
$ ocrmypdf --skip-text ${path-to-input-pdf} ${path-to-output-pdf}
try on your machine

This command is using the OCRmyPDF tool to perform OCR (Optical Character Recognition) on a PDF file. Here's a breakdown of the command:

  • ocrmypdf: It is the command to run the OCRmyPDF tool.
  • --skip-text: This option specifies that the command should exclude extracting existing embedded text from the input PDF file. It essentially tells the tool not to consider any pre-existing text in the PDF and focus solely on performing OCR.
  • ${path-to-input-pdf}: This is the placeholder for the path or location of the input PDF file. You need to replace it with the actual path to your input PDF file.
  • ${path-to-output-pdf}: This is the placeholder for the path or location where the OCR processed PDF file should be saved. You need to replace it with the desired path and name for the output PDF file.

By executing this command, OCRmyPDF will take the given input PDF, exclude any pre-existing text, apply OCR to recognize the text within the document, and produce an output PDF file with the OCR results.

This explanation was created by an AI. In most cases those are correct. But please always be careful and never run a command you are not sure if it is safe.
back to the ocrmypdf tool