9.25 拆解Python字节码

发布于 2015-08-30 07:48:32 | 189 次阅读 | 评论: 0 | 来源: 网络整理

问题¶

You want to know in detail what your code is doing under the covers by disassembling it into lower-level byte code used by the interpreter.

解决方案¶

The dis module can be used to output a disassembly of any Python function. For example:

>>> def countdown(n):
... while n > 0:
...     print('T-minus', n)
...     n -= 1
... print('Blastoff!')
...
>>> import dis
>>> dis.dis(countdown)
...
>>>

讨论¶

The dis module can be useful if you ever need to study what’s happening in your program at a very low level (e.g., if you’re trying to understand performance characteristics). The raw byte code interpreted by the dis() function is available on functions as follows:

>>> countdown.__code__.co_code
b"x'x00|x00x00dx01x00kx04x00r)x00tx00x00dx02x00|x00x00x83
x02x00x01|x00x00dx03x008}x00x00qx03x00Wtx00x00dx04x00x83
x01x00x01dx00x00S"
>>>

If you ever want to interpret this code yourself, you would need to use some of the constants defined in the opcode module. For example:

>>> c = countdown.__code__.co_code
>>> import opcode
>>> opcode.opname[c[0]]
>>> opcode.opname[c[0]]
'SETUP_LOOP'
>>> opcode.opname[c[3]]
'LOAD_FAST'
>>>

Ironically, there is no function in the dis module that makes it easy for you to process the byte code in a programmatic way. However, this generator function will take the raw byte code sequence and turn it into opcodes and arguments.

import opcode

def generate_opcodes(codebytes):
    extended_arg = 0
    i = 0
    n = len(codebytes)
    while i < n:
        op = codebytes[i]
        i += 1
        if op >= opcode.HAVE_ARGUMENT:
            oparg = codebytes[i] + codebytes[i+1]*256 + extended_arg
            extended_arg = 0
            i += 2
            if op == opcode.EXTENDED_ARG:
                extended_arg = oparg * 65536
                continue
        else:
            oparg = None
        yield (op, oparg)

To use this function, you would use code like this:

>>> for op, oparg in generate_opcodes(countdown.__code__.co_code):
...     print(op, opcode.opname[op], oparg)

It’s a little-known fact, but you can replace the raw byte code of any function that you want. It takes a bit of work to do it, but here’s an example of what’s involved:

>>> def add(x, y):
...     return x + y
...
>>> c = add.__code__
>>> c
<code object add at 0x1007beed0, file "<stdin>", line 1>
>>> c.co_code
b'|x00x00|x01x00x17S'
>>>
>>> # Make a completely new code object with bogus byte code
>>> import types
>>> newbytecode = b'xxxxxxx'
>>> nc = types.CodeType(c.co_argcount, c.co_kwonlyargcount,
...     c.co_nlocals, c.co_stacksize, c.co_flags, newbytecode, c.co_consts,
...     c.co_names, c.co_varnames, c.co_filename, c.co_name,
...     c.co_firstlineno, c.co_lnotab)
>>> nc
<code object add at 0x10069fe40, file "<stdin>", line 1>
>>> add.__code__ = nc
>>> add(2,3)
Segmentation fault

Having the interpreter crash is a pretty likely outcome of pulling a crazy stunt like this. However, developers working on advanced optimization and metaprogramming tools might be inclined to rewrite byte code for real. This last part illustrates how to do it. See this code on ActiveState for another example of such code in action.

问题¶

解决方案¶

讨论¶

后端技术

前端技术

数据库

热门框架

常用IDE

其他