Implementation Information PEP302 Required

I am trying to use PEP302 based import hooks to catch the import of modules, so I may have some encrypted .py files that will load at runtime. I follow the pattern when obfuscating python at https://github.com/citrusbyte/python-obfuscation .

The basic idea is simple: intercept the import command using the Finder () function inserted in sys.meta_path, which catches the import directive. Finder checks if the module is the one we want to handle, and if so, returns a custom Loader object. Otherwise, it ignores the import. The user loader creates an entry in sys.modules and reads into the python module source and adds it to the newly created module using exec, as defined in the PEP302 documentation.

This works basically fine, but I have one specific situation that I cannot understand. Suppose 3 files, main, foo and bar. main sets an import hook, then imports foo and bar. foo imports the bar itself. So the situation is this:

main: set_import_hook import foo import bar foo: import bar bar: <irrelevant> 

I have debug statements in the Finder function set as hook to see what is being passed.

When I have unencrypted code (i.e., code that I myself don’t process or add directly to sys.modules, the prints show the following behavior:

 Finder (foo) Finder (bar) called from inside foo when foo itself is loaded Finder (bar) called from main after returning from the import foo 

When I process and load the foo and bar files myself, this behavior:

 Finder (foo) Finder (foo.bar) tries to load bar in the context of foo Finder (bar) called from main after returning from import foo 

This leads to the fact that two versions of the bar exist in sys.modules. If you look at sys.modules.keys () in two cases, in the first case it only displays foo and bar. In the second case, it shows foo, foo.bar and bar.

I do not understand this behavior. The module creation process is described in PEP 302. This is what I use:

  module = sys.modules.setdefault(name, imp.new_module(name)) module.__file__ = filename module.__path__ = [os.path.dirname(os.path.abspath(file.name))] module.__loader__ = self sys.modules[name] = module exec(src, module.__dict__) 

Thanks.

+5
source share
1 answer

After a bunch of searching for various examples and documentation, I have a partial answer.

In the above code, I noticed that I did not install module.__package__ . Somewhere in the import process, this led to the entry foo.__package__ = 'foo' being entered in the module definition. As a result, foo was considered a package, and any imports it imported were considered imported relative to the package directory.

When importing, where I did not configure the module, I saw that the module.__package__ was installed on the None system. But setting module.__package__ = None in the above code did not work. Something reset for foo.

The solution that worked was to install module.__package__ = '' (empty line). Thus, the working part of the code to add the module:

 module = sys.modules.setdefault(name, imp.new_module(name)) module.__file__ = filename module.__path__ = [os.path.dirname(os.path.abspath(file.name))] module.__loader__ = self module.__package__ = '' sys.modules[name] = module exec(src, module.__dict__) 

Now this works, and the foo and bar modules are imported only once. The behavior of encrypted and unencrypted modules looks the same.

I still don't understand where module.__package__ unless it is explicitly installed on. ''

0
source

Source: https://habr.com/ru/post/1265202/


All Articles