If you have plans to recognize non-Latin or numeric characters, then it is better to find a non-java library, but to choose from some (external) tools and use other methods (1) to get your text. On Linux, I used cuneiform (2) via the command line interface.
command line interface and channel, for example.
cuneiform ported to Linux, but I donβt know about the working command line interface for Windows
source share