ocrmypdf
OCRmypdf is a command line tool that allows users to add OCR (Optical Character Recognition) to PDF files. It uses OCR to recognize and extract text from scanned documents or images within a PDF file, making it searchable and selectable. OCRmypdf supports a wide range of input formats, including PDF, raster images (such as PNG and TIFF), and even DJVU files. It utilizes OCR engines like Tesseract and Ghostscript for accurate text recognition. The tool also offers various options for enhancing the OCR quality, such as text smoothing and de-speckling. OCRmypdf can handle multipage PDF files, processing them efficiently and producing high-quality output. One of its key features is the preservation of the original PDF document structure and formatting while adding the OCR layer. OCRmypdf supports multiple languages, allowing users to perform OCR for documents in various languages. The tool is open-source and available for free, making it accessible to anyone. OCRmypdf is a versatile and powerful tool for individuals and organizations looking to make their PDF files more accessible and searchable.
List of commands for ocrmypdf:
-
ocrmypdf:tldr:177fd ocrmypdf: Clean, de-skew, and rotate pages of a poor scan.$ ocrmypdf --clean --deskew --rotate-pages ${path-to-input_file} ${path-to-output-pdf}try on your machineexplain this command
-
ocrmypdf:tldr:1b83b ocrmypdf: Set the metadata of the searchable PDF file.$ ocrmypdf --title "${title}" --author "${author}" --subject "${subject}" --keywords "${keyword; key phrase; ---}" ${path-to-input_file} ${path-to-output-pdf}try on your machineexplain this command
-
ocrmypdf:tldr:7f231 ocrmypdf: Skip pages of a mixed-format input PDF file that already contain text.$ ocrmypdf --skip-text ${path-to-input-pdf} ${path-to-output-pdf}try on your machineexplain this command
-
ocrmypdf:tldr:9f24e ocrmypdf: Display help.$ ocrmypdf --helptry on your machineexplain this command
-
ocrmypdf:tldr:e803a ocrmypdf: Create a new searchable PDF/A file from a scanned PDF or image file.$ ocrmypdf ${path-to-input_file} ${path-to-output-pdf}try on your machineexplain this command