I regularly receive PDF files. Encoding works as follows:
So the examples are:
13579 -> 3579; hello -> jgnnq
this is basically the offset (possibly swap) of ASCII characters.
The question is how can I automatically find the offset when I have access to several samples. I cannot be sure if the encoding offset has changed. All I know is some text that usually (if not always) appears, for example. "Name:", "Summary:", "Total:" inside the PDF.
Thank!
edit: thanks for the feedback. I will try to break the question down into smaller questions:
1: (-) ?
.
, +2 , ( +2 char)
h i j e f g l m n l m n o p q 1 2 3 3 4 5 5 6 7 7 8 9 9 : ;
,
>>> text='jgnnq' >>> knowns=['hello', '13579'] >>> >>> for i in range(-5,+5): #check -5 to +5 char code range ... rot=''.join(chr(ord(j)+i) for j in text) ... for x in knowns: ... if x in rot: ... print rot ... hello
PDF (, ) (, ..)?
, (, , ). , , , . . .
, , , 1000 , ( ) 127 . , , . .
, . ( ), .
-
, , , , , . , , , , "" , , .
, , , . , . , .
, .
, , ( ), .
, ( count : ).
: ? .
PDF, Acrobat Reader? , PDF (, PDF Clown) , .
Source: https://habr.com/ru/post/1742771/More articles:Problem with C ++ recursion - c ++How can I get the file size in Perl before processing the download request? - uploadEnum: get a list of keys - javaHow to enable indexing of pages with dynamic data? - architectureSharepoint: how to get the current site / web list - sharepointUpdating the application icon in the Windows 7 taskbar - windows-7How to detect identical part (s) inside a string? - pythonAndroid listing design issue with cursors - androidКак я могу написать раздел реестра из VS post build event? - visual-studioCreating Mini Dumps for _caught_ SEH exceptions in a mixed DLL - c ++All Articles