Pytesseract error Windows error [Error 2]

Hi, I am trying python pytesseract library to extract text from an image. Code:
from PIL import Image from pytesseract import image_to_string print image_to_string(Image.open(r'D:\new_folder\img.png')) 

But the following error appeared:

 Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Python27\lib\site-packages\pytesseract\pytesseract.py", line 161, in image_to_string config=config) File "C:\Python27\lib\site-packages\pytesseract\pytesseract.py", line 94, in run_tesseract stderr=subprocess.PIPE) File "C:\Python27\lib\subprocess.py", line 710, in __init__ errread, errwrite) File "C:\Python27\lib\subprocess.py", line 958, in _execute_child startupinfo) WindowsError: [Error 2] The system cannot find the file specified 

I did not find a specific solution. Can anyone help me what to do. All you need to download or where I can download it, etc.

Thank you in advance:)

+3
source share
3 answers

I had the same problem and quickly found a solution after reading this post:

OSError: [Errno 2] There is no such file or directory using pytesser

Just need to adapt it to Windows, replace the following code:

 tesseract_cmd = 'tesseract' 

with:

 tesseract_cmd = 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract' 

(need double \\ to avoid first line \ per line)

+2
source

You get an exception because the subprocess cannot find the binaries (tesser executable).

Installation is a three-step process:

1. Download / install the libs / binaries system levels :

For different OSs, help here. For MacOS, you can directly install it using brew.

Install Google Tesseract OCR (additional information on how to install the engine on Linux, Mac OSX, and Windows). You should be able to refer to tesseract as tesseract. If this is not the case, for example because tesseract is not in your PATH, you will have to change "tesseract_cmd" at the top of tesseract.py. Under Debian / Ubuntu, you can use the tesseract-ocr package. For Mac OS Users. please install tesseract homeprew package.

For Windows :

The installer for the old version 3.02 is available for Windows from our download page. This includes English language learning data. if you want to use another language, download the relevant training data, unzip it using 7-zip and copy the .traineddata file to 'tessdata', probably C:\Program Files\Tesseract-OCR\tessdata .

To access tesseract-OCR from anywhere, you may need to add a directory where the tesseract-OCR binaries are located in the path of the variables, possibly C:\Program Files\Tesseract-OCR .

You can download .exe from here .


2. Install the Python package

 pip install pytesseract 

3. Finally, you need to have the tesseract binary in PATH .

Or you can install it at runtime:

 import pytesseract pytesseract.pytesseract.tesseract_cmd = '<path-to-tesseract-bin>' 

For Windows :

 pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract' 
  • The above line will make it work temporarily, for a permanent solution add tesseract.exe to PATH - for example, PATH=%PATH%;"C:\Program Files (x86)\Tesseract-OCR ".

  • Also, verify that the TESSDATA_PREFIX Windows environment TESSDATA_PREFIX is set to the directory containing the tessdata directory. For instance:

    TESSDATA_PREFIX = C: \ Program Files (x86) \ Tesseract-OCR

i.e. Location tessdata: C:\Program Files (x86)\Tesseract-OCR\tessdata


Your example:

 from PIL import Image import pytesseract pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract' print pytesseract.image_to_string(Image.open(r'D:\new_folder\img.png')) 
+2
source

You need the Tesseract OCR engine ("Tesseract.exe") installed on your computer. If the path is not configured on your computer, specify the full path in pytesseract.py (tesseract.py).

Readme

Install Google Tesseract OCR (additional information on how to install the engine on Linux, Mac OSX, and Windows). You should be able to invoke the tesseract command as tesseract. If this is not the case, for example, because tesseract is not in your PATH, you will have to change the variable "tesseract_cmd" at the top of tesseract.py. On Debian / Ubuntu, you can use the tesseract-ocr package. For Mac OS Users. please install tesseract homeprew package.

Another thread

0
source

Source: https://habr.com/ru/post/1243781/


All Articles