Quantcast
Channel: Planet Python
Viewing all articles
Browse latest Browse all 22462

Guido van Rossum: Adding type annotations for fspath

$
0
0

Type annotations for fspath

Python 3.6 will have a new dunder protocol, __fspath__() , which should be supported by classes that represent filesystem paths. Example of such classes are the pathlib.Path family and os.DirEntry (returned by os.scandir() ).

You can read more about this protocol in the brand new PEP 519. In this blog post I’m going to discuss how we would add type annotations for these additions to the standard library.

I’m making frequent use of AnyStr , a quite magical type variable predefined in the typing module. If you’re not familiar with it, I recommend reading my blog post about AnyStr . You may also want to read up on generics in PEP 484 (or read mypy’s docs on the subject).

Adding os.scandir() to the stubs for os.py

For practice, let’s see if we can add something to the stub file for os.py. As of this writing there’s no typeshed information for os.scandir() , which I think is a shame. I think the following will do nicely. Note how we only define DirEntry and scandir() for Python versions >= 3.5. (Mypy doesn’t support this yet, but it will soon, and the example here still works — it just doesn’t realize scandir() is only available in Python 3.5.) This could be added to the end of stdlib/3/os/__init__.pyi:

from typing import Generic, AnyStr, overload, Iterator

if sys.version_info >= (3, 5):

    class DirEntry(Generic[AnyStr]):
        name = ...  # type: AnyStr
        path = ...  # type: AnyStr
        def inode(self) -> int: ...
        def is_dir(self, *, follow_symlinks: bool = ...) -> bool: ...
        def is_file(self, *, follow_symlinks: bool = ...) -> bool: ...
        def is_symlink(self) -> bool: ...
        def stat(self, *, follow_symlinks: bool = ...) -> stat_result: ...

    @overload
    def scandir() -> Iterator[DirEntry[str]]: ...
    @overload
    def scandir(path: AnyStr) -> Iterator[DirEntry[AnyStr]]: ...

Deconstructing this a bit, we see a generic class (that’s what the Generic[AnyStr] base class means) and an overloaded function.  The scandir() definition uses @overload because it can also be called without arguments. We could also write it as follows; it’ll work either way:

    @overload
    def scandir(path: str = ...) -> Iterator[DirEntry[str]]: ...
    @overload
    def scandir(path: bytes) -> Iterator[DirEntry[bytes]]: ...

Either way there really are three ways to call scandir(), all three returning an iterable of DirEntry objects:

  • scandir() -> Iterator[DirEntry[str]] 
  • scandir(str) -> Iterator[DirEntry[str]] 
  • scandir(bytes) -> Iterator[DirEntry[bytes]] 

Adding os.fspath()

Next I’ll show how to add os.fspath() and how to add support for the __fspath__() protocol to DirEntry .

PEP 519 defines a simple ABC (abstract base class), PathLike , with one method, __fspath__() . We need to add this to the stub for os.py , as follows:

class PathLike(Generic[AnyStr]):
    @abstractmethod
    def __fspath__(self) -> AnyStr: ...

That’s really all there is to it (except for the sys.version_info check, which I’ll leave out here since it doesn’t really work yet). Next we define os.fspath() , which wraps this protocol. It’s slightly more complicated than just calling its argument’s __fspath__() method, because it also handles strings and bytes. So here it is:

@overload
def fspath(path: PathLike[AnyStr]) -> AnyStr: ...
@overload
def fspath(path: AnyStr) -> AnyStr: ...

Easy enough! Next is update the definition of DirEntry . That’s easy too — in fact we only need to make it inherit from PathLike[AnyStr] , the rest is the same as the definition I gave above:

class DirEntry(PathLike[AnyStr], Generic[AnyStr]):
    # Everything else unchanged!

The only slightly complicated bit here is the extra base class Generic[AnyStr] . This seems redundant, and in fact PEP 484 says we can leave it off, but mypy doesn’t support that yet, and it’s quite harmless — this just rubs into mypy’s face that this is a generic class of one type variable (the by-now famous AnyStr ).

Finally we need to make a similar change to the stub for pathlib.py . Again, all we need to do is to make PurePath inherit from PathLike[str] , like so:

from os import PathLike

class PurePath(PathLike[str]):
    # Everything else unchanged!

However, here we don’t add Generic , because this is not a generic class! It inherits from PathLike[str] , which is quite un-generic, since it’s PathLikespecialized for just str .

Note that we don’t actually have to define the __fspath__() method in these stubs — we’re not supposed to call them directly, and stubs don’t provide implementations, only interfaces.

Putting it all together, we see that it’s quite elegant:

for a in os.scandir('.'):
    b = os.fspath(a)
    # Here, the typechecker will know that the type of b is str!

The derivation that b has type str is not too complicated: first, os.scandir('.') has a str argument, so it returns an iterator of DirEntry objects parameterized with str , which we write as DirEntry[str] . Passing this DirEntry[str] to os.fspath() then takes the first of that function’s two overloads (the one with PathLike[AnyStr] ), since it doesn’t match the second one ( DirEntry doesn’t inherit from AnyStr , because it’s neither a str nor bytes ). Further the AnyStr type variable in PathLike[AnyStr] is solved to stand for just str , because DirEntry[str] inherits from PathLike[str] . This is the specialized version of what the code says: DirEntry[AnyStr] inherits from PathLike[AnyStr] .

Okay, so maybe that last paragraph was intermediate or advanced. And maybe it could be expanded. Maybe I’ll write another blog about how type inference works, but there’s a lot on that topic, and other authors have probably already written better introductory material about generics (in other languages, though).

Making things accept PathLike

There’s a bit of cleanup work that I’ve left out. PEP 519 says that many stdlib functions that currently take strings for pathnames will be modified to also accept PathLike . For example, here’s how the signatures for os.scandir() would change:

@overload
def scandir() -> Iterator[DirEntry[str]]: ...
@overload
def scandir(path: AnyStr) -> Iterator[DirEntry[AnyStr]]: ...
@overload
def scandir(path: PathLike[AnyStr]) -> Iterator[DirEntry[AnyStr]]: ...

The first two entries are unchanged; I’ve just added a third overload. (Note that the alternative way of defining scandir() would require more changes — an indication that this way is more natural.)

I also tried doing this with a union:

@overload
def scandir() -> Iterator[DirEntry[str]]: ...
@overload
def scandir(path: Union[AnyStr, PathLike[AnyStr]]) -> Iterator[DirEntry[AnyStr]]: ...

But I couldn’t get this to work, so the extra overload is probably the best we can do. Quite a few functions will require a similar treatment, sometimes introducing overloading where none exists today (but that shouldn’t hurt anything).

A note about pathlib : since it only deals with strings, its methods (the ones that PEP 519 says should be changed anyway) should use PathLike[str] rather than PathLike[AnyStr] .

Acknowledgments

(Thanks for comments on the draft to Stephen Turnbull, Koos Zevenhoven, Ethan Furman, and Brett Cannon.)

Viewing all articles
Browse latest Browse all 22462

Trending Articles