How can I understand the contents of a .pyc file?

I have a .pyc file. I need to understand the contents of this file in order to find out how the python disassembler works, that is, how I can create output, for example, dis.dis(function) from the .pyc file contents.

eg,

 >>> def sqr(x): ... return x*x ... >>> import dis >>> dis.dis(sqr) 2 0 LOAD_FAST 0 (x) 3 LOAD_FAST 0 (x) 6 BINARY_MULTIPLY 7 RETURN_VALUE 

I need to get this output using a .pyc file.

+6
source share
1 answer

.pyc files contain some metadata and a marshal ed code object; load the code object and parse using:

 import dis, marshal, sys # Header size changed in 3.3. It might change again, but as of this writing, it hasn't. header_size = 12 if sys.version_info >= (3, 3) else 8 with open(pycfile, "rb") as f: magic_and_timestamp = f.read(header_size) # first 8 or 12 bytes are metadata code = marshal.load(f) # rest is a marshalled code object dis.dis(code) 

Demo with bisect module:

 >>> import bisect >>> import dis, marshal >>> import sys >>> header_size = 12 if sys.version_info >= (3, 3) else 8 >>> with open(bisect.__file__, "rb") as f: ... magic_and_timestamp = f.read(header_size) # first 8 or 12 bytes are metadata ... code = marshal.load(f) # rest is bytecode ... >>> dis.dis(code) 1 0 LOAD_CONST 0 ('Bisection algorithms.') 3 STORE_NAME 0 (__doc__) 3 6 LOAD_CONST 1 (0) 9 LOAD_CONST 8 (None) 12 LOAD_CONST 2 (<code object insort_right at 0x106a459b0, file "/Users/mpieters/Development/Library/buildout.python/parts/opt/lib/python2.7/bisect.py", line 3>) 15 MAKE_FUNCTION 2 18 STORE_NAME 2 (insort_right) 22 21 LOAD_NAME 2 (insort_right) 24 STORE_NAME 3 (insort) 24 27 LOAD_CONST 1 (0) 30 LOAD_CONST 8 (None) 33 LOAD_CONST 3 (<code object bisect_right at 0x106a45ab0, file "/Users/mpieters/Development/Library/buildout.python/parts/opt/lib/python2.7/bisect.py", line 24>) 36 MAKE_FUNCTION 2 39 STORE_NAME 4 (bisect_right) 45 42 LOAD_NAME 4 (bisect_right) 45 STORE_NAME 5 (bisect) 47 48 LOAD_CONST 1 (0) 51 LOAD_CONST 8 (None) 54 LOAD_CONST 4 (<code object insort_left at 0x106a45bb0, file "/Users/mpieters/Development/Library/buildout.python/parts/opt/lib/python2.7/bisect.py", line 47>) 57 MAKE_FUNCTION 2 60 STORE_NAME 6 (insort_left) 67 63 LOAD_CONST 1 (0) 66 LOAD_CONST 8 (None) 69 LOAD_CONST 5 (<code object bisect_left at 0x106a45cb0, file "/Users/mpieters/Development/Library/buildout.python/parts/opt/lib/python2.7/bisect.py", line 67>) 72 MAKE_FUNCTION 2 75 STORE_NAME 7 (bisect_left) 89 78 SETUP_EXCEPT 14 (to 95) 90 81 LOAD_CONST 6 (-1) 84 LOAD_CONST 7 (('*',)) 87 IMPORT_NAME 8 (_bisect) 90 IMPORT_STAR 91 POP_BLOCK 92 JUMP_FORWARD 17 (to 112) 91 >> 95 DUP_TOP 96 LOAD_NAME 9 (ImportError) 99 COMPARE_OP 10 (exception match) 102 POP_JUMP_IF_FALSE 111 105 POP_TOP 106 POP_TOP 107 POP_TOP 92 108 JUMP_FORWARD 1 (to 112) >> 111 END_FINALLY >> 112 LOAD_CONST 8 (None) 115 RETURN_VALUE 

Note that this is only the top-level code object that defines the module. If you want to analyze the contained functions, you will need to load the nested code objects from the top-level code.co_consts ; for example, the insort_right function code object is loaded using LOAD_CONST 2 , so find the code object in this index:

 >>> code.co_consts[2] <code object insort_right at 0x106a459b0, file "/Users/mpieters/Development/Library/buildout.python/parts/opt/lib/python2.7/bisect.py", line 3> >>> dis.dis(code.co_consts[2]) 12 0 LOAD_FAST 2 (lo) 3 LOAD_CONST 1 (0) 6 COMPARE_OP 0 (<) 9 POP_JUMP_IF_FALSE 27 13 12 LOAD_GLOBAL 0 (ValueError) 15 LOAD_CONST 2 ('lo must be non-negative') 18 CALL_FUNCTION 1 21 RAISE_VARARGS 1 24 JUMP_FORWARD 0 (to 27) 14 >> 27 LOAD_FAST 3 (hi) 30 LOAD_CONST 5 (None) 33 COMPARE_OP 8 (is) 36 POP_JUMP_IF_FALSE 54 15 39 LOAD_GLOBAL 2 (len) 42 LOAD_FAST 0 (a) 45 CALL_FUNCTION 1 48 STORE_FAST 3 (hi) 51 JUMP_FORWARD 0 (to 54) 16 >> 54 SETUP_LOOP 65 (to 122) >> 57 LOAD_FAST 2 (lo) 60 LOAD_FAST 3 (hi) 63 COMPARE_OP 0 (<) 66 POP_JUMP_IF_FALSE 121 17 69 LOAD_FAST 2 (lo) 72 LOAD_FAST 3 (hi) 75 BINARY_ADD 76 LOAD_CONST 3 (2) 79 BINARY_FLOOR_DIVIDE 80 STORE_FAST 4 (mid) 18 83 LOAD_FAST 1 (x) 86 LOAD_FAST 0 (a) 89 LOAD_FAST 4 (mid) 92 BINARY_SUBSCR 93 COMPARE_OP 0 (<) 96 POP_JUMP_IF_FALSE 108 99 LOAD_FAST 4 (mid) 102 STORE_FAST 3 (hi) 105 JUMP_ABSOLUTE 57 19 >> 108 LOAD_FAST 4 (mid) 111 LOAD_CONST 4 (1) 114 BINARY_ADD 115 STORE_FAST 2 (lo) 118 JUMP_ABSOLUTE 57 >> 121 POP_BLOCK 20 >> 122 LOAD_FAST 0 (a) 125 LOAD_ATTR 3 (insert) 128 LOAD_FAST 2 (lo) 131 LOAD_FAST 1 (x) 134 CALL_FUNCTION 2 137 POP_TOP 138 LOAD_CONST 5 (None) 141 RETURN_VALUE 

I personally avoided trying to .pyc file with anything other than a suitable version of Python and the marshal module. The marshal format is basically an internal serialization format that changes according to the needs of Python itself. New features, such as list views and with and async / await operators, require new additions to a format that is not published except for C source code .

If you follow this route and read the code object in ways other than using the module, you will need to parse the disassembly from the various attributes of the code object; see dis module source for details on how to do this (you will need to use the co_firstlineno and co_lnotab for example, to create a bytecode-offset-line map, for example).

+14
source

Source: https://habr.com/ru/post/918674/


All Articles