Quantcast
Channel: Planet Python
Viewing all articles
Browse latest Browse all 22462

Brett Cannon: If I were designing Python's import from scratch

$
0
0

Talk to any developer that inherits some large, old code base that has developed semantics as time has gone on and they will always have something they wished they could change about the code they inherited. After inheriting import in Python, I too have a list of things I would love to see changed in how it works to make it a bit more sane and easier to work with. This blog post is basically a brain dump/wishlist of what I would love to see changed in import some day.

No global state

As import currently stands, all of its state is stored in the sys module. This makes growing the API rather painful as it means expanding a module's API surface rather than adding another attribute on an object. For me, I would rather have import be a fully self-contained object that stored all of its own state.

This has been proposed before in PEP 406 and under the name of "import engine". It unfortunately has not gone anywhere simply due to the fact that it would take time to design the API for a fully encapsulated import class and it doesn't buy people anything today. Now in the future it could open up some unique possibilities for import itself -- which will be discussed later -- as well as simply be cleaner to maintain as it would allow for cleaner separation between interpreters in a single process.

Making this actually happen would occur over stages. A new ImportEngine class would created which would define the API we wished import would have from scratch. That API would then delegate under the hood to the sys module so that semantics stayed the same, including making instances of the class callable and assigning such an instance to builtins.__import__. At some point the objects that were stored in the instance of builtins.__import__ would be set in the sys module instead of the object delegating to the sys module itself. After a proper amount of time, once everyone had moved over to using the object's API instead of the sys module then we could consider cutting out the import-related parts from the sys module.

Make __import__ more sane

In case you didn't know, the signature for builtins.__import__() is a bit nuts:

def__import__(name,globals=None,locals=None,fromlist=(),level=0):pass

The locals argument isn't used. The globals argument is only used for calculating relative imports and thus only needs __package__ (technically __name__ and __path__ are also used, but only when __package__ isn't defined and that only happens if you do something bad). The fromlist parameter has to do with how the bytecode operates -- which I will talk about later -- and level is just the number of leading dots in a relative import.

If I had my way, the function would be defined as:

def__import__(name,spec=None):pass

This is the almost the same signature as importlib.import_module(), but with passing in the spec of the calling module instead of just its __package__; nice, simple, and easy to comprehend. The only thing I might consider changing is keeping the level argument since that is a bit of string parsing that can be done ahead of time and baked into the bytecode, but I don't know if it really would make that much of a performance difference.

You can only import modules

Having the ability to import attributes off of a module really sucks from an implementation perspective. The bytecode itself doesn't handle that bit of detail and instead hoists it upon import. It also leads to people getting into circular import problems. Finally, it causes people to separate from afar what namespace an object belongs to which can make code easier to read by keeping the association of an object and its containing module together). Plus you can easily replace from module import attr with import module; attr = module.att; TOOWTDI.

So if I had my way, when you said from foo import bar, it would mean Python did import foo.bar; bar = foo.bar and nothing else. No more from ... import *, no more __all__ for modules, etc.; you wouldn't be allowed to import anything that didn't end up in sys.modules (and I'm sure some teacher is saying "but import * makes things easier", but in my opinion the cost of that little shortcut is too costly to keep it around). It makes thing cleaner to implement which helps eliminate edge cases. It makes code easier to analyze as you would be able to tell what modules you were after (mostly) statically. It just seems better to me both from my end in terms of implementing import and just simplifying the semantics for everyone to comprehend.

Looking up __import__ like anything else

Like a lot of syntax in Python, the import statement is really just syntactic sugar for calling the builtins.__import__ function. But if we changed the semantics to follow normal name lookup instead of short-circuiting directly the builtins namespace, some opportunities open up.

For instance, would you like to have dependencies unique to your package, e.g. have completely separate copies of your dependencies so you eliminate having to share the same dependency version with all other installed packages? Well, if you changed Python's semantics to look up __import__ like any other object then along with the import engine idea mentioned earlier you can have a custom sys.path and sys.modules for your package by having a package-specific __import__. Basically you would need a loader that injected into the module's __dict__ its own instance of __import__ that knew how to look up dependencies unique to the package. So you could have a .dependencies directory directly in your package's top-level directory and have __import__ put that at the front of its own sys.path for handling top-level imports. That way if you needed version 1.3 of a package but other code needed 2.0 you could then put the project's 1.3 version in the .dependencies directory and have that on your private sys.path before site-packages, making everything fall through. It does away with the whole explicit vendoring some projects do to lock down their dependencies.

Now I don't know how truly useful this would be. Vendoring is not hard thanks to relative imports and most projects don't seem to need it. It also complicates things as it means modules wouldn't be shared across packages and so anything that relied on object identity like an except clause for matching caught exceptions could go south really fast (as the requests project learned the hard way). And then thanks to the venv module and the concept of virtual environments the whole clashing dependency problem is further minimized. But since I realized this could be made possible I at least wanted to write it down. :)

I doubt any of this will ever change

While I may be able to create an object for __import__ that people use, getting people to use that instead of the sys module would be tough, especially thanks to not being able to detect when someone replaced the objects on sys entirely instead of simply mutating them. Changing the signature of __import__ would also be somewhat tough, although if an object for __import__ was used then the bytecode could call a method on that object and then __import__.__call__ would just be a shim for backwards-compatibility (and honestly people should not be calling or mucking with __import__ directly anyway; use importlib.import_module() or all of the other various hooks that importlib provides instead). Only importing modules is basically a dead-end due to backwards-compatibility, but I may be able to make the bytecode do more of the work rather than doing it in __import__ itself. And getting Python to follow normal lookup instead of going straight to builtins when looking for __import__ probably isn't worth the hassle and potential compatibility issues.


Viewing all articles
Browse latest Browse all 22462

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>