On this page you find all important commands for the CLI tool tesseract. If the command you are looking for is missing please ask our AI.

tesseract

Tesseract is an optical character recognition (OCR) command line tool developed by Google. It is designed to read and extract text information from images and scanned documents. Tesseract supports over 100 languages, making it highly versatile and widely used. It can handle various image formats, such as TIFF, JPEG, PNG, and GIF. Tesseract uses machine learning algorithms to recognize text patterns in images and convert them into editable text. In addition to plain text extraction, Tesseract can also retain font styles, font sizes, and other formatting details. It provides options for improving OCR accuracy through image preprocessing techniques like noise reduction, image binarization, and deskewing. Tesseract offers multiple output formats, including plain text, hOCR (HTML), PDF, and searchable PDF. It can be customized and trained with additional data to improve recognition accuracy for specific domains or languages. Tesseract is an open-source project with an active community, making it continuously updated and improved.

List of commands for tesseract:

- tesseract
tesseract:tldr:2ce65 tesseract: Specify a custom page segmentation mode (default is 3).

$ tesseract -psm ${0_to_10} ${image-png} ${output}

try on your machine

explain this command
- tesseract
tesseract:tldr:524d6 tesseract: Specify a custom language (default is English) with an ISO 639-2 code (e.g. deu = Deutsch = German).

$ tesseract -l deu ${image-png} ${output}

try on your machine

explain this command
- tesseract
tesseract:tldr:75ce5 tesseract: Recognize text in an image and save it to `output.txt` (the `.txt` extension is added automatically).

$ tesseract ${image-png} ${output}

try on your machine

explain this command
- tesseract
tesseract:tldr:b3846 tesseract: List the ISO 639-2 codes of available languages.

$ tesseract --list-langs

try on your machine

explain this command
- tesseract
tesseract:tldr:dc8d9 tesseract: List page segmentation modes and their descriptions.

$ tesseract --help-psm

try on your machine

explain this command

tool overview