.pyc
files contain some metadata and a marshal
ed code
object; load the code
object and parse using:
import dis, marshal, sys # Header size changed in 3.3. It might change again, but as of this writing, it hasn't. header_size = 12 if sys.version_info >= (3, 3) else 8 with open(pycfile, "rb") as f: magic_and_timestamp = f.read(header_size) # first 8 or 12 bytes are metadata code = marshal.load(f) # rest is a marshalled code object dis.dis(code)
Demo with bisect
module:
>>> import bisect >>> import dis, marshal >>> import sys >>> header_size = 12 if sys.version_info >= (3, 3) else 8 >>> with open(bisect.__file__, "rb") as f: ... magic_and_timestamp = f.read(header_size)
Note that this is only the top-level code object that defines the module. If you want to analyze the contained functions, you will need to load the nested code
objects from the top-level code.co_consts
; for example, the insort_right
function code object is loaded using LOAD_CONST 2
, so find the code object in this index:
>>> code.co_consts[2] <code object insort_right at 0x106a459b0, file "/Users/mpieters/Development/Library/buildout.python/parts/opt/lib/python2.7/bisect.py", line 3> >>> dis.dis(code.co_consts[2]) 12 0 LOAD_FAST 2 (lo) 3 LOAD_CONST 1 (0) 6 COMPARE_OP 0 (<) 9 POP_JUMP_IF_FALSE 27 13 12 LOAD_GLOBAL 0 (ValueError) 15 LOAD_CONST 2 ('lo must be non-negative') 18 CALL_FUNCTION 1 21 RAISE_VARARGS 1 24 JUMP_FORWARD 0 (to 27) 14 >> 27 LOAD_FAST 3 (hi) 30 LOAD_CONST 5 (None) 33 COMPARE_OP 8 (is) 36 POP_JUMP_IF_FALSE 54 15 39 LOAD_GLOBAL 2 (len) 42 LOAD_FAST 0 (a) 45 CALL_FUNCTION 1 48 STORE_FAST 3 (hi) 51 JUMP_FORWARD 0 (to 54) 16 >> 54 SETUP_LOOP 65 (to 122) >> 57 LOAD_FAST 2 (lo) 60 LOAD_FAST 3 (hi) 63 COMPARE_OP 0 (<) 66 POP_JUMP_IF_FALSE 121 17 69 LOAD_FAST 2 (lo) 72 LOAD_FAST 3 (hi) 75 BINARY_ADD 76 LOAD_CONST 3 (2) 79 BINARY_FLOOR_DIVIDE 80 STORE_FAST 4 (mid) 18 83 LOAD_FAST 1 (x) 86 LOAD_FAST 0 (a) 89 LOAD_FAST 4 (mid) 92 BINARY_SUBSCR 93 COMPARE_OP 0 (<) 96 POP_JUMP_IF_FALSE 108 99 LOAD_FAST 4 (mid) 102 STORE_FAST 3 (hi) 105 JUMP_ABSOLUTE 57 19 >> 108 LOAD_FAST 4 (mid) 111 LOAD_CONST 4 (1) 114 BINARY_ADD 115 STORE_FAST 2 (lo) 118 JUMP_ABSOLUTE 57 >> 121 POP_BLOCK 20 >> 122 LOAD_FAST 0 (a) 125 LOAD_ATTR 3 (insert) 128 LOAD_FAST 2 (lo) 131 LOAD_FAST 1 (x) 134 CALL_FUNCTION 2 137 POP_TOP 138 LOAD_CONST 5 (None) 141 RETURN_VALUE
I personally avoided trying to .pyc
file with anything other than a suitable version of Python and the marshal
module. The marshal
format is basically an internal serialization format that changes according to the needs of Python itself. New features, such as list views and with
and async
/ await
operators, require new additions to a format that is not published except for C source code .
If you follow this route and read the code
object in ways other than using the module, you will need to parse the disassembly from the various attributes of the code object; see dis
module source for details on how to do this (you will need to use the co_firstlineno
and co_lnotab
for example, to create a bytecode-offset-line map, for example).