Tesseract.exe file does not exist

I installed the pytesseract library using

 pip install pytesseract 

When I tried to use the image_to_text method, it gave me

FileNotFoundError: [WinError 2] The system cannot find the specified file

I looked for it and found that I had to change something in the pytesseract.py file and line

 tesseract_cmd = 'tesseract' 

should become

 tesseract_cmd = path_to_folder_that_contains_tesseractEXE + 'tesseract' 

I searched and found no tesseract.exe files in my Python folder, then reinstalled the library, but there was still no file. In the end, I replaced the line:

 tesseract_cmd = path_to_folder_that_contains_pytesseractEXE + 'pytesseract' 

and my program threw:

pytesseract.pytesseract.TesseractError: (2, 'Usage: python pytesseract.py [-l lang] input_file')

What can I do to make my program work?

PS Here is my program code:

 from pytesseract import image_to_string from PIL import Image, ImageEnhance, ImageFilter im = Image.open(r'C:\Users\\Desktop\ImageToText_Python\NoName.png') print(im) txt = image_to_string(im) print(txt) 

Full check of the first attempt:

 File "C:/Users/user/Desktop/ImageToText.py", line 10, in <module> text = pytesseract.image_to_string(im) File "C:\Python\lib\site-packages\pytesseract\pytesseract.py", line 122, in image_to_string config=config) File "C:\Python\lib\site-packages\pytesseract\pytesseract.py", line 46, in run_tesseract proc = subprocess.Popen(command, stderr=subprocess.PIPE) File "C:\Python\lib\subprocess.py", line 947, in __init__ restore_signals, start_new_session) File "C:\Python\lib\subprocess.py", line 1224, in _execute_child startupinfo) FileNotFoundError: [WinError 2]The system can not find the file specified 

Full trace of the second attempt

 Traceback (most recent call last): File "C:\Users\user\Desktop\ImageToText.py", line 6, in <module> txt = image_to_string(im) File "C:\Python\lib\site-packages\pytesseract\pytesseract.py", line 125, in image_to_string raise TesseractError(status, errors) pytesseract.pytesseract.TesseractError: (2, 'Usage: python pytesseract.py [-l lang] input_file') 
+1
source share
3 answers

From the README project :

 try: import Image except ImportError: from PIL import Image import pytesseract pytesseract.pytesseract.tesseract_cmd = '<full_path_to_your_tesseract_executable>' # Include the above line, if you don't have tesseract executable in your PATH # Example tesseract_cmd: 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract' print(pytesseract.image_to_string(Image.open('test.png'))) print(pytesseract.image_to_string(Image.open('test-european.jpg'), lang='fra')) 

So, you need to make sure that tesseract.exe is located on your computer (for example, by installing Tesseract-OCR), then add the containing folder to the PATH environment variable or declare it using the pytesseract.pytesseract.tesseract_cmd attribute

+1
source

For people in the same case as me: here is the tesseract-OCR loader. After you finish the download, go to your chosen path, there should be a file called tesseract.exe , copy the path to this file and paste it into pytesseract.exe .

0
source
  • If you use Windows, you need to install tesseract-ocr from this link (05/03/01 - stable version and is supported for extracting a foreign language). And add the path (where you installed the software) to the environment variable.

  • If you use ubuntu OS - in the terminal type "sudo apt-get install tesseract-ocr"

  • Pytesseract is a python shell that will help you access this tesseract-ocr software.

Note 1: if you want to extract foreign languages, you must include tessdata files in the installed path.

Note 2: Python 2 will not have good support when extracting a foreign language, so it is better to go with python 3.

0
source

Source: https://habr.com/ru/post/1243785/


All Articles