I have been trying to like static type checking in Python. For most of my Django projects, I get annoyed and give up, so I’ve had a go with some smaller projects instead. This blog post documents how it went with Parsy, a parser combinator library I maintain.
Contents
Intro to Parsy
I need to explain a few things about Parsy.
In Parsy, you build up Parser
objects via a set of primitives and combinators. Each Parser
object has a parse
method that accepts strings [1] and returns some object — which might be a string for your lowest building blocks, but quickly you build more complex parsers that return different types of objects. A lot of code is written in “fluent” style where you chain methods together.
Here are some basics:
The primitive string
just matches and returns the input:
But we can change the result to something other type of object:
We can map the parse result using a callable. This time I’m starting with a regex primitive, which returns strings, but converting to ints:
We can discard things we don’t care about in a number of ways, such as with these “pointy” operators that point to the important bit:
We can have a sequence of items, here with some separator we don’t care about collecting:
three_ints=P.seq(int_parser<<P.string("-"),int_parser<<P.string("-"),int_parser)assertthree_ints.parse("123-45-67")==[123,45,67]
If we want something better than a list to store different components in (usually we do), we can use a keyword argument form of seq
to give names for the components and collect them in a dict instead of a list, and instead of .map
we can do .combine_dict
to convert the result into some other object:
fromdataclassesimportdataclass@dataclassclassDate:year:intmonth:intday:intdate_parser=P.seq(year=P.regex(r"\d{4}").map(int)<<P.string("-"),month=P.regex(r"\d{2}").map(int)<<P.string("-"),day=P.regex(r"\d{2}").map(int)).combine_dict(Date)assertdate_parser.parse("2022-11-19")==Date(year=2022,month=11,day=19)
And we can have alternatives:
bool_parser=P.string("true").result(True)|P.string("false").result(False)assertbool_parser.parse("false")==False
With that done, let’s have a look at my 4 different approaches to improving the static type checking story for Parsy.
Simple types
The most obvious thing to do is to add Parser
as the return value for a bunch of methods and operators, and other type hints wherever we can for the input arguments. For example, .map
is:
What type of object does the parse
method return? We don’t know, so we have to do:
And this is our first hint that the static type checking isn’t very useful. For example, this faulty code will now type check:
We can see the return value is not going to be the right type, but our static type checker sees the Any
and allows it.
Our .map
method also is not catching TypeError
exceptions that it definitely ought to be able to. For example, suppose we have this faulty parser which is going to attempt to construct a timedelta
from strings like "7 days ago"
:
This is going to fail every time with:
TypeError: unsupported type for timedelta days component: str
That’s because we forgot a .map(int)
after the P.regex
parser.
This kind of type error is caught for you by mypy and pyright if you try to pass a string to the timedelta
constructor, but here, we’ve got no constraints on the callable that would enable the type checker to pick it up. When you put a bare Callable
in a signature, as we did above for the map
method, you are really writing Callable[..., Any]
, so all proper type checking effectively gets disabled.
If you want static type checking, this is not what you expect or want! It’s especially important for Parsy, because almost all the mistakes you are likely to make will be of this nature. Most Parsy code consists of parsers defined at a module level, which means that as soon as you import the module, you’ll know whether you have attempted to use combinator methods that don’t exist, for example, so there is little usefulness in a type checker being able to tell you this. What you want to know is whether you are going to get TypeError
or a similar type mistake when you call the parse
method at runtime.
Can we achieve that?
Generics
The answer is to make the Parser
type aware of what kind of object it is going to output. This can be achieved by parameterising the type of Parser
with a generic type, and is the second main approach.
Very often generics are used for homogeneous containers, to capture the type of the object they contain. Here, we don’t have a container as such. We are instead capturing the type of the object that the parser instance is going to produce when you call .parse()
(assuming it succeeds).
Some of the key type signatures in our Parser
code now look like this (lots of details elided):
fromtypingimportTypeVar,GenericOUT=TypeVar("OUT")OUT1=TypeVar("OUT1")OUT2=TypeVar("OUT2")@dataclassclassResult(Generic[OUT]):value:OUT...classParser(Generic[OUT]):defparse(self,stream:str)->OUT:...defmap(self:Parser[OUT1],map_function:Callable[[OUT1],OUT2])->Parser[OUT2]:...
The main point is that we are capturing the type of output using a type parameter that our static type checker can track, which means that we can indeed catch the type errors that were ignored by the previous approach. I got this to work, and you can see my results in this PR (not merged).
The reason it isn’t merged is that this approach breaks down as soon as you have things like *args
or **kwargs
where the arguments need to be of different types. We have exactly that, multiple times, once you care about generics. For example, the seq combinator takes sequence of parsers as input, and runs them in order, collecting their results. All of them are Parser
instances, so that would work fine with the previous approach, but they could all have different output types. There is no way to specify a type signature for this, as well as for alt, combine and combine_dict.
The best you can do is specify that they return Parser[Any]
. This means you are downgrading to no type checking. This problem is going to apply to all but the most trivial cases — it’s difficult to come up with many real world examples where you don’t need sequencing.
Some people would say “well, it works sometimes, so it is better than nothing”. The problem is that you when you start writing your parser, you may well really benefit from the type checking and start to lean on it. Then, as soon as you get beyond the level of your simple (single part) objects and are creating parsers for more complex (multiple part) objects in your language, the type checking silently disappears. Or, if you have strictness turned up high, your type checker will complain about the introduction of Any
, but you won’t be able to do anything about it.
Both of these are really bad developer UX in my opinion. If the type checker is going to give up an go home at 2pm on the second day, it would have been better not to show up, which would push the developer to lean on other, more reliable methods, like writing tests.
Typed Parsy fork
So we come to my third option, which builds on the second. Can we redesign Parsy so that it doesn’t have any Any
? This would be a backwards incompatible fork that removes any API that is impossible to fully type, and attempts to provide some good enough replacements.
This was my most ambitious foray into static type checking in Python, and below are my notes from two perspectives — first implementation, which is important for any potential future contributors and maintainers, and secondly, usage, for the people who actually might want to use this fork.
Implementation perspective
I didn’t complete this for reasons that will become clear. Overall, I’d say working “alongside” mypy and pyright was quite nice at points, and other times really difficult. To keep this article short, I’ve moved most of this section to footnotes. Here are the bullet points:
You can see my results in the typed-parsy repo, especially the single source file.
I dropped support for anything but
str
as input type, as a simplification.I discovered that pyright can really shine in various places that mypy is lacking, particularly error messages.
But sometimes they fight each other [2]
I couldn’t work out how Protocols work with respect to operators and dunder methods. [3]
Covariance is tricky, and you have to understand it. [4]
forward_declaration made my head explode. [5]
There are lots of places marked
TODO
where I just couldn’t solve the new problems I had, even after getting rid of the most problematic code, and I had to give up and dotype: ignore
quite a few times.
Overall
Too much of this was just too hard for me. It seemed much worse than doing the same thing in Haskell for some reason. This might be just because the language wasn’t designed for it. Even just the syntax for types is significantly worse.
I think another issue is that there is no REPL for type level work. Normally when I’m trying to debug something, I jump into a REPL. Working with actual, concrete values is so much easier, and so much closer to the immediate connection that makes programming enjoyable.
An additional problem is that static type checkers have to worry about issues that may not be relevant to my code.
Finally, the type system we have right now for Python is so far behind what Python can actually express. But it isn’t necessarily obvious when this is the case. The answer to “why doesn’t this work” is anywhere between, “I made a dumb mistake”, “I need to learn more”, “there’s a bug in mypy” and “that’s impossible (at the moment)”.
As someone who needs to worry about future contributors and maintainers, these are serious issues. In addition to getting code to work, contributors would also have to get the type checks to pass, and ensure they weren’t breaking type checking for users, which is an extra burden.
Using it
So much for the pains of implementing typed-parsy, what would it look like for a user?
First, it happens that for a lot of typical usage, the user doesn’t need to worry about types or adding type hints at all, which is great.
Second, for the resulting code, mypy and pyright do a very good job of checking almost every type error you would normally make in your parsers. The few places where we lose type safety are limited and don’t result in Any
escaping and trashing everything from then on.
However, if you do need to write type signatures, which you probably will if you have mypy settings turned up high and you want to make your own combinator functions (i.e. something that takes a Parser and returns a new Parser), which is fairly common, you’re going to need to understand a lot to create type hints that are both correct and useful.
In addition, to achieve all this, we had to make some big sacrifices:
Sequences
You can’t implement seq
, alt
, .combine
or .combine_dict
in a type safe way (without degrading everything to Parser[Any]
from then on), and I had to remove them.
The biggest issue is seq, and especially the convenience of the keyword argument version to name things. The alternative I came up with — using &
operator for creating a tuple of two results — does work, but turns out to be pretty ugly.
The following incomplete extracts from the SQL SELECT example illustrate the readability loss fairly well. We have some enum types and dataclasses to hold Abstract Syntax Tree nodes for a SQL parser:
classOperator(enum.Enum):EQ="="LT="<"GT=">"LTE="<="GTE=">="@dataclassclassComparison:left:ColumnExpressionoperator:Operatorright:ColumnExpression# dataclass for Select etc
We then have a bunch of parsers for different components, which we assemble into larger parsers for bigger things, like Comparison
or Select
. With normal parsy it looks like this:
comparison=P.seq(left=column_expr<<padding,operator=P.from_enum(Operator),right=padding>>column_expr,).combine_dict(Comparison)SELECT=P.string("SELECT")FROM=P.string("FROM")WHERE=P.string("WHERE")select=P.seq(_select=SELECT+space,columns=column_expr.sep_by(padding+P.string(",")+padding,min=1),_from=space+FROM+space,table=table,where=(space>>WHERE>>space>>comparison).optional(),_end=padding+P.string(";"),).combine_dict(Select)
There are some things you need to understand: seq
runs a sequence of parsers in order, and with its keyword arguments version allows you to give names to each one, to produce a dictionary of results. .combine_dict
then passes these to a callable using **kwargs
syntax.
.combine_dict
also has a neat trick of skipping items whose names start with underscores, to allow you to deal with things that you need to parse but want to discard, like _select
, _from
and _end
above.
With typed-parsy, this was the best I could do, formatted using Black:
comparison=((column_expr<<padding)&P.from_enum(Operator)&(padding>>column_expr)).map(lambdat:Comparison(left=t[0][0],operator=t[0][1],right=t[1]))SELECT=string("SELECT")FROM=string("FROM")WHERE=string("WHERE")select=(((SELECT+space)>>(column_expr.sep_by(padding+string(",")+padding,min=1))<<(space+FROM+space))&table&((space>>WHERE>>space>>comparison).optional()<<(padding+string(";")))).map(lambdat:Select(columns=t[0][0],table=t[0][1],where=t[1]))
This is a pretty massive regression. We can’t use commas to separate parts as before, and we can’t use keyword arguments to name components any more. Instead we have to use tuples, and when we end up with nested tuples it’s awful — I literally couldn’t work out how to write the tuple indexing correctly and had to just keep guessing until I got it right. And that’s just with 3 items, some parsers might have many more items in a sequence, each of which adds another level of nesting.
This might have been significantly better if we still had tuple unpacking within function/lambdas signatures (which was removed in Python 3), but still not very nice.
It is kind of impressive that mypy and pyright will handle all this and tell you about type violations very reliably. This is possible because they have support for indexing tuples i.e. in the above statements it can tell you what the types of t[0]
, t[0][1]
etc are. In an IDE, tools like pyright will tell you what you need to supply for .map()
— for example for the select
statement, inside the final .map
call:
(map_fn: (tuple[tuple[list[Field | String | Number], Table], Comparison | None]) -> OUT2@map) -> Parser[OUT2@map]",
But this isn’t my idea of developer friendly. The loss of readability is huge, even for simple cases.
For comparison, I looked at funcparserlib, the Python parsing library closest to Parsy. They claim “fully typed”, but it turns out that their sequencing operator, which returns only tuples and so isn’t as usable as seq
, flattens nested tuples. This is much better for usability, but is also impossible to type, so they introduce Any
at this point, and so lose type checking, the thing I’ve been trying to avoid.
Error messages
If you are relying on types to fix you up, instead of readable code, then you depend on the error messages your type checker will emit. Below is an example of an error I got when attempting to port the example code in simple_logo_lexer.py
, which has a function flatten_list
:
Argument 1 to "map" of "Parser" has incompatible type "Callable[[List[List[T]]], List[T]]"; expected "Callable[[List[Tuple[Tuple[str, int], str]]], List[T]]"
Here was another one I hit:
Argument 1 to "map" of "Parser" has incompatible type "Callable[[Tuple[List[List[Tuple[List[List[OUT]], List[OUT]]]], List[Tuple[List[List[OUT]], List[OUT]]]]], List[List[Tuple[List[List[OUT]], List[OUT]]]]]"; expected "Callable[[Tuple[List[List[Tuple[List[List[OUT]], List[OUT]]]], List[Tuple[List[List[OUT]], List[OUT]]]]], List[OUT]]"
I can’t remember what mistake produced that, but I did notice that the code worked perfectly at runtime. Also, pyright didn’t complain, only mypy. This is the kind of thing that makes people hate static typing.
One of the main principles of parsy is that it should be very easy to pull data into appropriate containers where every field is named, as well as typed— rather than a parse that returns list of nested lists and tuples and dicts. This is why namedtuple
is a big improvement over tuple
, and dataclasses are a big step up again. But at the level of types and error messages, it seems we are back in the dark ages, with nested tuples and lists everywhere.
@generate decorator
Being able to add conditional logic and control flow into parsers is really important for some cases, and Parsy has an elegant solution in the form of the @generate decorator. Getting this to work in typed-parsy turned out to be only partially possible.
The first issue is that, unlike other ways of building up parsers, the user will need to write a type signature to get type checking, and it complex enough there is no reasonable way for someone to understand what type signature they need without looking up the docs, which is poor usability.
Having done so, they can get type checking on the return type of the parser, and code that uses that parser. However, they get no type checking related to parsers used within the function. The Generator
type assumes a homogeneous stream of yield and send types, whereas we have pairs of yield/send types which need to match within the pair, but each pair can be completely different from the next in the stream.
Since you can’t sacrifice @generate
with major loss of functionality/usability, you have to live with the fact that you do not have type safety in the body of a @generate
function.
Overall
This did not feel like an upgrade for a user, but rather like a pretty big downgrade. typed-parsy was definitely going to be worse than parsy, so I stopped working on it.
Which brings me to my last approach:
Types for documentation
At some point along the way, I noticed that for the original version of parsy, with no type hints at all, my language server (pyright) was able to correctly infer return types of all the methods that returned Parser
instances. The types it was inferring were the same as I would have added in the very first section, simple Parser
objects, and that meant it could reliably give help with chained method calls, which is pretty nice.
The biggest problems were that the docstrings weren’t helpful (in most cases missing), and that for many parameters it wasn’t entirely obvious what type of object you should be passing in.
So, using a small amount of effort, we could improve usability a lot. We can add those dosctrings, and add type hints that are about the same level of types that pyright was inferring anyway, just a bit more complete.
The one thing I don’t want to do is imply that these types bring Parsy code up to the level of being “type checked”, but there is a simple way I can do that — by not including a py.typed
marker file to my package.
So, I’m back at the beginning, but with a different aim. Now, it’s not about helping automated static type checkers — without a py.typed
marker in the module they basically ignore the types — it’s about improving usability for developers as they write. I’ve done this work in the master branch now, and will hopefully release it soon.
There is another interesting advantage to this: because I’ve given up on static type checks and I’m not using static type checking for internal use in Parsy at all, I can be a bit looser with the type hints, allowing for greater readability. For example, I can annotate an optional string argument as arg: str = None
, even though that’s not strictly compliant with PEP 484. In other places I can use slightly simplified type hints for the sake of having something that doesn’t make your eyes glaze over, even if it doesn’t show all the possibilities — such as saying that input types can be str | bytes | list
, when technically I should be writing something much more abstract and complicated using Sequence
.
Conclusion
In the end, type hints didn’t work out for use by a static type checker. But as clear and concise documentation for humans that pop up in a code editor, they worked well. Unless or until there are some big improvements in what it is possible to express with static types in Python, this seems to be the best solution for Parsy.
I learnt a lot about the limitations of typing in Python along the way, and hope you found this helpful too!
Footnotes
[1]Parser input:
Actually Parsy supports bytes and in fact any sequence of objects as input. But I’m ignoring that for the rest of the post for the sake of simplicity.
[2]mypy and pyright can fight: For example, when implementing the operators that discard one of the parsers, pyright complained (I think rightly) about unused generic parameters. When I removed them, mypy wanted them put back in!
[3]Protocols and operators:
I wanted some way of expressing “an object that supports addition”, or actually “an object that supports addition and returns an object of the same type”. I needed this for the +
operator, which you can use for both things like Parser[str]
and Parser[list[str]]
.
But I couldn’t get this minimal test case to work:
classAddable(Protocol[T]):@abstractmethoddef__add__(self:T,other:T)->T:passdeffoo(x:Addable[T],y:Addable[T])->T:returnx+y
Mypy reports:
Unsupported operand types for + ("Addable[T]" and "Addable[T]") [operator]
This is probably my fault, but I couldn’t work it out, I obviously need to understand more about protocols.
[4]Covariance:
As a replacement for the variadic seq
which produces a hetereogeneous list (in its simplest form), I added the &
operator which returns a tuple. This can be typed correctly, because unlike list
/typing.List
which is considered to be a homogeneous container, tuple
/typing.Tuple
is treated as a product type, like it is in other languages.
This pairs well with the existing |
operator for alternatives, which produces a union (or sum type).
def__or__(self:Parser[OUT1],other:Parser[OUT2])->Parser[Union[OUT1,OUT2]]:...def__and__(self:Parser[OUT1],other:Parser[OUT2])->Parser[tuple[OUT1,OUT2]]:...
In implementing the first, however, both mypy and pyright complain:
Expression of type "Result[OUT1@__or__]" cannot be assigned to return type "Result[OUT1@__or__ | OUT2@__or__]"
The issue here is that the type checker needs to know that a Result[A]
is a sub-type of a Result[A | B]
, which is only true if Result
is covariant with respect to that type parameter. Since it’s an immutable container of a single item, whose only operation is that you can extract the item, it is covariant.
My first attempt to fix this, however, was to change all the type parameters to covariant=True
, which gave me a ton more problems — both mypy and pyright complaining "covariant type variable cannot be used in parameter type" in many places.
It turns out, after a lot of attempts, head scratching, and finally the fresh take involved in writing a blog post, I just needed to use a covariant type parameter only for the definition of Result
:
I think this is now correct, and the fact that I’m not using the OUT_co
type parameter in all the other places is not a problem, but to be honest I’m not entirely sure. Both mypy and pyright now seem happy, so I think I’m done?
forward_declaration
:
This is a neat way of untying recursive definitions. In typical uses, I think the type hint for the object produced also requires a recursive definition. Every time I looked at to work out how you would declare a parser using forward_declaration
such that a type checker could still work, my brain just shut down. I think it would probably require users having to use forward references at the type level, but might also hit some unsolvable problems in how that interacts with forward_declaration
.