This is the third and final post in my trilogy about applying the python static typing tool mypy to a real world open source project (I chose pycodestyle as an example). The setup for this can be found at Part 1, and the details of my findings are in Part 2.
On this post I will answer, based on the results, the questions that I initially proposed which were:
- Does it help me discover actual bugs?
- Does the process of adding types help making the code base more understandable?
- How mature is mypy itself? Is it usable right now, does it have a lot of bugs?
- Is the type system flexible enough to express the kind of actual dynamic tricks that developers like us use in actual, production python code?
- Does it feel practical/usable?
- What other things that I didn’t expect can be learned from the experience?
Discovery of actual bugs
Given that pycodestyle is a small, widely used and tested code base here, I didn’t expect to find any serious problem in it. Unsurprisingly, I didn’t find any type bugs in it; however I found many error-prone constructions (like functions that apparently returned a bool
but sometimes actually returned an empty string or list instead of False
) which might lead to hard to find problems. I also found some redundant or unused code, and some code that had unnecessarily complicated flow controls with variables that changed type back and forth, and that would be really hard to refactor (and mypy also helped me make sure the refactor was better)
My opinion here is that the result was mildly positive in this situation (stable project). Trying this on a code base while it is being modified/developed will probably have more interesting conclusions given that there are probably more type bugs to find.
Adding types to make the code base more understandable
The changes brought by mypy here were huge in several different ways.
First, my personal understanding (as an outsider to the code that had to learn it) grew very quickly and steadily throughout the process of adding annotations. I don’t think I would have achieved the level of knowledge that I did if I had spent the same time just looking at the code and/or making some quick diagrams on paper which is my usual way to approach this task.
Second, important aspects of the code design itself, like the call relations and the shape of some complicated data structures, surfaced up and turned into a very visible and explicit artifact that can help me (as the annotator) or other people (as consumers of the annotations) understand how pycodestyle works. This could be compared to the benefits of good code documentation (I could even use annotations without mypy) with the difference that mypy allows me to be sure that this specific kind of documentation is consistent and up to date, so I can fully trust it. My opinion is that the readability of the code grew in a huge amount.
Lastly, many specific details of the implementation that were complicated and hard to read were flagged as a problem by mypy, and the refactored version ended up being much cleaner and readable.
I think this is the single largest benefit of the static typing approach and the use of mypy.
The maturity of mypy
If you’ve read part two you’ll notice that I found a fairly large amount of “paper cuts” and usability issues. None of the problems I found were a big show stopper, but I can say that they slowed me down a bit, and there’s room for improvement both in user friendliness and stability.
Most, if not all of the issues I found, seem like something that is superficial and will probably be fixed in future versions, so even if usability and stability are concerns I don’t feel that they are something to be worried about.
Flexibility of the type system vs dynamic tricks
I found some problems here, but in general they were fewer than I expected. On the other hand I found some issues in places where I haven’t foreseen problems beforehand (mostly, booleans and the semantic of short circuit operators in Python).
But something that was really fresh for me was the gradual typing approach: I have worked in Python for a long time, and I’ve also used different kinds of statically typed languages (from C to Haskell going through Eiffel) and the “feel” of the tool is a bit different to all of them. Some highly dynamic code (like the optparse
argument parser which has run-time configured attributes) was just not covered by the type specs, and the typechecker knows that parts is dynamically typed and does not complain. I artificially pushed to cover most of the pycodestyle code and found some minor issues (but most of them easy to “silence” without much effort), probably in a real case scenario I would have covered a bit less.
There are some features of the type system that are uncommon in other statically typed imperative languages like Union types and some overload support in types (even if the language doesn’t support it in runtime) that made it easier to describe unusual cases. The implementation of typevars and generics make the system quite expressive. In a language like Python there will always be scenarios where the system isn’t flexible enough, but I got widely more than enough to call it successful.
If I had to mention a weak point here it’s probably around callables and function signatures. Python has an extremely rich way of specifying function signatures (varargs, keyword args, open keyword args, keyword only, argument packing, optional arguments with defaults, ...) and being able to specify that a function matches a given signature is not always possible (although there are some proposals), and that would be useful for describing functional style APIs.
Usability/applicability to real world use
The tool is working and producing useful results. It was very fast on the pycodestyle codebase, and generally fast when working with stub files. I made some quick experiments with larger codebases (the Django web framework) and it has a somewhat slow checking time when it starts following deep import chains in large projects.
Even if I found some usability issues that I have already mentioned, I got to be quite productive and managed to cover a lot of code with a reasonable amount of effort (even as a first time user of mypy and not knowing the code I was annotating beforehand). It is not the most polished piece of my software development toolkit but it definitely adds value right now and that will improve in the future.
Some supporting evidence of this are the reports from the developers of mypy (most of them working for Dropbox) which report using it in a very large code base with positive results (you can listen to this podcast from the mypy team for further details).
One large limiting factor may be the support of third party libraries (and completion/polish of the python stdlib stubs). My experience didn’t cover much of this because pycodestyle is built just on bare python, but I’m quite sure that the value of static typing is higher if the lower levels of your stack are annotated, and currently very few things besides the standard library support mypy. My guess is that mypy will be weaker when you’re just gluing together high level pieces of a framework (for an unannotated framework), and stronger when your code has a lot of programming and design of your own built on standard python or annotated code.
Other conclusions
Regarding the third library support one limitation of the approach provided currently is that if you want to add support for a library, your options are:
- Create stub files. This can work, but there’s no way to type check that stub definitions are consistent with your method implementations, and being in separate files they are hard to maintain in sync
- Adding annotations to your code, which forces full code checking which is much slower (although there’s some work being done in incremental checking that should help here).
Other problem of adding annotations is that the nice syntax is the python 3 one, but many library authors want to also support python 2 for a few more years; so the only reasonable way is to add python 2 style annotations (which are especially formatted comments). However they look uglier, and will be more effort (converting them to python 3 style) a few years from today. Solving that could boost efforts to get more annotations in python libraries.
Conclusions
Mypy is a useful tool for projects now, and its applicability will grow over time. There’s a lot of work ahead in terms of making it more stable, supporting more libraries, documenting it better, establishing conventions on how to use it, and making it easier to use and to integrate with the developer workflow. Having some official support (at least on the annotation language) from the python project is a good guarantee that this work will eventually be there. But even without that, the value today is already positive. Applied in the parts of the code where a static typing style is more effective provides a significant boost on code readability and maintainability. Its gradual nature allows leaving unchecked your most dynamic code, or code that depends on unsupported libraries and still get the benefit on the rest of your codebase.
I’m looking forward to use it in future projects, and see how mypy evolves, but I’m quite confident that with some time and community support this tool may turn into a standard piece of the Python development stack. There are a couple of efforts here at Machinalis to help support more pieces of the python libs we normally use (Django and web tools, and data science/machine learning tools).
Mypy is certainly something I’d recommend to consider for every project, given the possibility of adding advantages for your products or customers. And if you’re already using mypy I’d love to hear what you’re applying it to!