Decoding algorithm

Question

Decoding algorithm

I regularly receive PDF files. Encoding works as follows:

PDF files may display correctly in Acrobat Reader
select everything and copy the test through Acrobat Reader
and paste into a text editor
will show that the content is encoded

So the examples are:

13579 -> 3579;
hello -> jgnnq

this is basically the offset (possibly swap) of ASCII characters.

The question is how can I automatically find the offset when I have access to several samples. I cannot be sure if the encoding offset has changed. All I know is some text that usually (if not always) appears, for example. "Name:", "Summary:", "Total:" inside the PDF.

Thank!

edit: thanks for the feedback. I will try to break the question down into smaller questions:

1: (-) ?

+3

algorithm encryption decode

ohho 26 . '10 8:20

5

PDF (, ) (, ..)?

, (, , ). , , , . . .

, , , 1000 , ( ) 127 . , , . .

, . ( ), .

-

, , , , , . , , , , "" , , .

, , , . , . , .

+3

Phil 26 . '10 8:55

, .

, , ( ), .

, .

+1

zaf 26 . '10 8:31

, ( count : ).

: ? .

+1

Lukas Šalkauskas 26 . '10 8:44

PDF, Acrobat Reader? , PDF (, PDF Clown) , .

0

Aistina 26 . '10 8:43

YOU · Accepted Answer · 2010-04-26T08:23:28+0000

.

, +2 , ( +2 char)

h i j
e f g
l m n
l m n
o p q

1 2 3
3 4 5
5 6 7
7 8 9
9 : ;

,

>>> text='jgnnq'
>>> knowns=['hello', '13579']
>>>
>>> for i in range(-5,+5): #check -5 to +5 char code range
...     rot=''.join(chr(ord(j)+i) for j in text)
...     for x in knowns:
...         if x in rot:
...             print rot
...
hello

Decoding algorithm

More articles: