I have this pdf file which is in Greek. A known problem occurs when you try to copy and paste text from it, which will lead to a little gibberish. The reason I speak more rather than the outcome is that, although the enclosed result does not make sense in Greek, it consists of real Greek characters. In addition, an interesting aspect of the problem is that not all characters are displayed correctly. For example, if you are comparing a source strip of text
ΕΞ. ΕΠΕΙΓΟΝ – ΑΜΕΣΗ ΕΦΑΡΜΟΓΗ
ΝΑ ΣΤΑΛΕΙ ΚΑΙ ΜΕ Ε-ΜΑIL
with pasted in pdf:
ΔΞ. ΔΠΔΙΓΟΝ – ΑΜΔΗ ΔΦΑΡΜΟΓΗ
ΝΑ ΣΑΛΔΙ ΚΑΙ ΜΔ Δ-ΜΑIL
You will notice that some of the characters are correctly inserted, while others are not. It may also be useful to mention that incorrect characters are not reflectively displayed correctly, for example. Ε becomes Δ and vice versa.
When I open a PDF using, for example, Adobe and print it using a PDF, in this case CutePDF, the output when copying and pasting is correct!
Given the above, my questions are as follows:
- What is the reason for this behavior?
- How do I integrate a solution into a java based workflow for randomly imported PDF files?
EDIT: multiple typos
source
share