Having my own version of the python parser has proven, so far, to be clumsy and chaotic. Clumsy because it means that I need a special interpreter just to run my language (which in any case uses an interpreter!), chaotic because the building of such interpreter has proven to not work stably in different machines. This means that currently it only works for me.
Because of this and because I wanted even more control over the parser (who said allowing to write things like
rsync(--help)
?), I decided to check my options. A friend of mine, more used to
playing with languages, suggested using pypy
to create my own parser,
but that just lead me a little further: why not outright 'steal' pypy
's parser? After all, they have their own, which
is also generated from Python
's Python.adsl
.
In fact it took me one hour to port the parser and a couple more porting the AST builder. This included porting them
to Python3
(both by running 2to3
and then applying some changes by hand, notably dict.iteritems -> dict.items
)
and trying to remove as much dependency on the rest of pypy
, specially from rpython
.
The last step was to migrate from their own AST implementation to Python
's, but here's where (again) I hit the last
brick wall: the ast.AST
class and subclasses are very special. They're implemented in C
, but
the Python
API does not allow to create nodes with the line and column info. for a moment I contemplated the option of
creating another extension (that is, written in C
) to make those calls, but the the obvious solution came to mind: a
massive replacement from:
return ast.ASTClass([params], foo.lineno, foo.column)
into:
new_node = ast.ASTClass([params])
new_node.lineno = foo.lineno
new_node.column = foo.column
return new_node
and some other similar changes. See here if you're really interested in all the details . I can only be grateful for regular expressions, capturing groups and editors that support both.
The following code is able to parse and dump a simple python script:
#! /usr/bin/env python3import ast
from pypy.interpreter.pyparser import pyparse
from pypy.interpreter.astcompiler import astbuilder
info= pyparse.CompileInfo('setup.py','exec')
p= pyparse.PythonParser(None)
t= p.parse_source(open('setup.py').read(), info)
a= astbuilder.ast_from_node(None, t, info)print(ast.dump(a))
The result is the following (formatted by hand):
Module(body=[ImportFrom(module='distutils.core', names=[alias(name='setup', asname=None)], level=0),Import(names=[alias(name='ayrton', asname=None)]),Expr(value=Call(func=Name(id='setup', ctx=<class'_ast.Load'>), args=None, keywords=[keyword(arg='name', value=Str(s='ayrton')),keyword(arg='version', value=Attribute(value=Name(id='ayrton', ctx=<class'_ast.Load'>), attr='__version__', ctx=<class'_ast.Load'>)),keyword(arg='description', value=Str(s='a shell-like scripting language based on Python3.')),keyword(arg='author', value=Str(s='Marcos Dione')),keyword(arg='author_email', value=Str(s='mdione@grulic.org.ar')),keyword(arg='url', value=Str(s='https://github.com/StyXman/ayrton')),keyword(arg='packages', value=List(elts=[Str(s='ayrton')], ctx=<class'_ast.Load'>)),keyword(arg='scripts', value=List(elts=[Str(s='bin/ayrton')], ctx=<class'_ast.Load'>)),keyword(arg='license', value=Str(s='GPLv3')),keyword(arg='classifiers', value=List(elts=[Str(s='Development Status :: 3 - Alpha'),Str(s='Environment :: Console'),Str(s='Intended Audience :: Developers'),Str(s='Intended Audience :: System Administrators'),Str(s='License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)'),Str(s='Operating System :: POSIX'),Str(s='Programming Language :: Python :: 3'),Str(s='Topic :: System'),Str(s='Topic :: System :: Systems Administration')],
ctx=<class'_ast.Load'>))], starargs=None, kwargs=None))])
The next steps are to continue removing references to pypy
code, and make sure it can actually parse all possible code.
Then I should revisit the harcoded limitations in the parser (in particular in
this loop) and then
be able to freely format program calls :).
Interesting times are arriving to ayrton
!
pythonayrton