I am working on a Ruby on Rails application to extract text and images from PDF files. When you extract images, some of them are damaged.
Is there a way to identify damaged images after extraction? Does anyone know why they are damaged?
I use utilities pdftohtmland pdftotext(poppler) Ubuntu.
Thanks in advance.
source
share