It's a new calendar year, which means it's time to think about what year-long plans I have for my contributions to the Python project. There's always the usual plans I have to approve some patches, fix some bugs, etc. But I seem to always have one or two projects that I really want to see happen that I simply can't solve in a weekend (either because of the amount of technical work or political work required). This year is no exception.
My resolutions
Rewrite zipimport
from scratch
The zipimport
module is a pain to maintain. It's written in C and has its own code to read from a zip file instead of using what zipfile
provides. This has led to people not wanting to work on it, and the issues have piled up because of it. For me, the biggest issue this has caused is that no one wants to update the module to make the importer follow modern import practices.
This has led to me wanting to rewrite zipimport
from scratch for a couple of years now. An idea a bunch of us had at the sprints at PyCon 2015 was to write the bits that read zip files in C and then all the import-related parts in pure Python and have it frozen into the interpreter along with importlib
. You might be able to even get the zip file part written in pure Python, but the question is performance and the ease of doing the work without access to the standard library (freezing importlib
means you can only use built-in or frozen modules from the standard library). There was even hope that writing the zip file code separately could lead to it being used by zipfile
or even lead to a new, simpler, smaller zip file module.
Implement importlib.resources
You may be asking yourself why would I want to rewrite zipimport
this year any more than previous years? Well, the new push for me to see the rewrite finally happen is I want to introduce importlib.resources
. Think of this new module as a modern replacement for pkg_resources.ResourceManager
, like pkg_resources.resource_string()
. I already wrote a first design draft, but some people are pushing hard for me to reconsider my proposed structure of the API from an object-oriented one a function-oriented one which aligns more with pkg_resources
, e.g.:
# Current proposal.binary_data=importlib.resources(module).read_bytes(path)# Alternate proposal; closely matches pkg_resources.resource_string(module, path);# would require supporting strings as paths instead of only pathlib.Path objects.binary_data=importlib.resources.read_bytes(module,path)
There is also a design issue with the current proposal that has to do with import itself and spec objects, but only importer authors will care about that detail and it isn't any more work than what I originally proposed.
Move Python development to GitHub
Hopefully all of this work will be easier to do this year, though, because I'm going to move Python's development over to GitHub. This decision has been in the works in some form or another for over a year and I finally made the call on January 1 that GitHub would become our repository host and code review tool. The hope is that this will let us cut back on the amount of custom services we have to maintain for Python development and make the process easier for core contributors so that we can get through patches faster. It also has a benefit that more people are familiar with GitHub than our custom workflow so this should also make it easier for others to contribute (although I'm hoping our patch approval throughput will increase more than the rate of patches so this is a net benefit for everyone involved). There is even some benefit to abstracting our tooling better so that future moves will be easier to do. And yes, this does mean Python is moving from Mercurial to Git as part of this transition.
It's going to be a busy year
Obviously there's a lot to do and none of it is small. Luckily the infrastructure work is balanced with some coding work. I also have people who have stepped forward to help with the GitHub stuff which will be the biggest chunk of work so it shouldn't cause me to burn out.
I also want to publicly thank my employer -- Microsoft Azure (and specifically the Python team in the Data + Analytics group led by Joseph Sirosh (CVP)) -- for letting me spend part of my work time on Python development which is how I actually have a chance in meeting these goals (and who did not ask for this plug, they just did a proof-reading of just this paragraph at my request). It's nice to not only spend every Friday on Python development (e.g., patches, etc.), but also whatever other time I need during the week as needs arise. This doesn't even count the other time I get to spend on other Python-related things such as Pyjion (which if our talk proposal for PyCon gets accepted you will hear more about in about six months). And the cherry on top is their moonlighting rules on open source are very liberal (I actually find it better than Google's open source moonlighting policy), so I can contribute to open source at home with no qualms. Obviously I'm liking my job a lot! And we are hiring on the Python team, so if you are willing to live in Redmond you can email us at pythonjobs at microsoft.com with your CV/résumé if you want to work on stuff like PTVS, Python support in VS Code, Azure-related stuff for Python, etc.