How to remove mask or damaged image from PDF?

I am working on a Ruby on Rails application to extract text and images from PDF files. When you extract images, some of them are damaged.

Is there a way to identify damaged images after extraction? Does anyone know why they are damaged?

I use utilities pdftohtmland pdftotext(poppler) Ubuntu.

Thanks in advance.

+4

ruby-on-rails ubuntu poppler

sam Feb 14 '17 at 12:44

No one has answered this question yet.

See related questions:

192

invalid char multibyte (US-ASCII) with Rails and Ruby 1.9

eleven

Create PDF from Rails 3 - Which Tool to Choose?

1

Why doesn't pdftoppm poppler-utils have a jpeg option?

1

How to convert a PDF image or image to text using Tesseract and / or Poppler?

1

How to extract all images from a read-protected PDF from a Linux command line?

0

Error installing redmine ubuntu 14.04

0

Poppler Failed to find font for

0

image and code broke when creating pdf using shrimp, rails

0

PDF optimization: pdftops -passfonts - How did it speed up PDF loading?

-1

PDF to extract text using Poppler C API

Source: https://habr.com/ru/post/1669698/

All Articles