Java OCR library recommendations?

I need to check a ton of photos to see if they have a keyword. Can anyone recommend a good, reliable OCR library? I will gladly sacrifice speed for accuracy.

+4
source share
2 answers

There are no clean Java OCR libraries that have something to do with precision . Depending on your budget, you can choose something that is not purely Java, but can be called from Java:

  • If you have a lot of time, but a zero budget - your choice is Tesseract. This is by far the best among open source.
  • If you have a small budget that you need to spend, and you only need to run this recognition once - the Cloud OCR API service will be your best choice. It is based on the commercial grade OCR engine and offers fairly affordable project prices. Disclaimer: I work for ABBYY
  • If you need to run this recognition as a permanent process forever, then you might think that it is more economical to purchase specialized conversion software, for example this one , it has an API and can be called from Java too. But in fact there are many alternatives if you are willing to invest some budget in licensing.
+14
source

If you have plans to recognize non-Latin or numeric characters, then it is better to find a non-java library, but to choose from some (external) tools and use other methods (1) to get your text. On Linux, I used cuneiform (2) via the command line interface.

  • command line interface and channel, for example.

  • cuneiform ported to Linux, but I don’t know about the working command line interface for Windows

+1
source

Source: https://habr.com/ru/post/1492933/


All Articles