Forrest logo
tool overview
On this page you find all important commands for the CLI tool tesseract. If the command you are looking for is missing please ask our AI.

tesseract

Tesseract is an optical character recognition (OCR) command line tool developed by Google. It is designed to read and extract text information from images and scanned documents. Tesseract supports over 100 languages, making it highly versatile and widely used. It can handle various image formats, such as TIFF, JPEG, PNG, and GIF. Tesseract uses machine learning algorithms to recognize text patterns in images and convert them into editable text. In addition to plain text extraction, Tesseract can also retain font styles, font sizes, and other formatting details. It provides options for improving OCR accuracy through image preprocessing techniques like noise reduction, image binarization, and deskewing. Tesseract offers multiple output formats, including plain text, hOCR (HTML), PDF, and searchable PDF. It can be customized and trained with additional data to improve recognition accuracy for specific domains or languages. Tesseract is an open-source project with an active community, making it continuously updated and improved.

List of commands for tesseract:

  • tesseract:tldr:2ce65 tesseract: Specify a custom page segmentation mode (default is 3).
    $ tesseract -psm ${0_to_10} ${image-png} ${output}
    try on your machine
    explain this command
  • tesseract:tldr:524d6 tesseract: Specify a custom language (default is English) with an ISO 639-2 code (e.g. deu = Deutsch = German).
    $ tesseract -l deu ${image-png} ${output}
    try on your machine
    explain this command
  • tesseract:tldr:75ce5 tesseract: Recognize text in an image and save it to `output.txt` (the `.txt` extension is added automatically).
    $ tesseract ${image-png} ${output}
    try on your machine
    explain this command
  • tesseract:tldr:b3846 tesseract: List the ISO 639-2 codes of available languages.
    $ tesseract --list-langs
    try on your machine
    explain this command
  • tesseract:tldr:dc8d9 tesseract: List page segmentation modes and their descriptions.
    $ tesseract --help-psm
    try on your machine
    explain this command
tool overview