Convert image to PDF for search

Hi, I am looking for an open source Java API that can convert a tiff image to searchable PDF (OCR). I have research, but nothing was found.

Note I reviewed this post, but this API does not convert the image to pdf. Java OCR implementation . However, I still play a little with the code.

+4
source share
2 answers

You can convert images to PDF using iText. It's hard to do OCR here, not create a PDF.

I will warn you: any OCR engine that is worth using will cost you a significant amount of money. Free and / or open source software is usually a pet project, proof of concept for some kind of algorithm. Not suitable for real world OCR applications. Tesseract is probably the best of the group, but even that has accuracy, which is much worse than commercial.

We have a commercial OCR application, and along this path I followed the assessment of engines - I would advise you to bite the bullet and contact the engine suppliers and get quotes: Abbyy (best accuracy, most expensive, slower), Expervision (fast, not so accurate , average price for the road), Nuance (average speed, accuracy and price). None of them will be written in Java, so you should plan some time to develop JNI code around your APIs.

Good luck is a big project!

+6
source

Cuneiform is free and easy to use, it will be output in hocr format, which can then be used to create an invisible text layer in PDF using the hocr2pdf tool, which is part of ExactImage.

+2
source

Source: https://habr.com/ru/post/1394265/


All Articles