Quantcast
Channel: Planet Python
Viewing all articles
Browse latest Browse all 22462

Vladimir Iakolev: Parse shell one-liners with pyparsing

$
0
0

For one of my projects I needed some one-liners parser to AST. I’ve tried PLY, pyPEG and a few more. And stopped on pyparsing. It’s actively maintained, works without magic and easy to use.

Ideally I wanted to parse something like:

LANG=en_US.utf-8 git diff | wc -l >> diffs

To something like:

(=LANGen_US.utf-8)(>>(|(gitdiff)(wc-l))(diffs))

So let’s start with simple shell command, it’s just space-separated tokens:

importpyparsingaspptoken=pp.Word(pp.alphanums+'_-.')command=pp.OneOrMore(token)command.parseString('git branch --help')>>>['git','branch','--help']

It’s simple, another simple part is parsing environment variables. One environment variable is token=token, and list of them separated by spaces:

env=pp.Group(token+'='+token)env.parseString('A=B')>>>[['A','=','B']]env_list=pp.OneOrMore(env)env_list.parseString('VAR=test X=1')>>>[['VAR','=','test'],['X','=','1']]

And now we can easily merge command and environment variables, mind that environment variables are optional:

command_with_env=pp.Optional(pp.Group(env_list))+pp.Group(command)command_with_env.parseString('LOCALE=en_US.utf-8 git diff')>>>[[['LOCALE','=','en_US.utf-8']],['git','diff']]

Now we need to add support of pipes, redirects and logical operators. Here we don’t need to know what they’re doing, so we’ll treat them just like separators between commands:

separators=['1>>','2>>','>>','1>','2>','>','<','||','|','&&','&',';']separator=pp.oneOf(separators)command_with_separator=pp.OneOrMore(pp.Group(command)+pp.Optional(separator))command_with_separator.parseString('git diff | wc -l >> out.txt')>>>[['git','diff'],'|',['wc','-l'],'>>',['out.txt']]

And now we can merge environment variables, commands and separators:

one_liner=pp.Optional(pp.Group(env_list))+pp.Group(command_with_separator)one_liner.parseString('LANG=C DEBUG=true git branch | wc -l >> out.txt')>>>[[['LANG','=','C'],['DEBUG','=','true']],[['git','branch'],'|',['wc','-l'],'>>',['out.txt']]]

Result is hard to process, so we need to structure it:

one_liner=pp.Optional(env_list).setResultsName('env')+ \
            pp.Group(command_with_separator).setResultsName('command')result=one_liner.parseString('LANG=C DEBUG=true git branch | wc -l >> out.txt')print('env:',result.env,'\ncommand:',result.command)>>>env:[['LANG','=','C'],['DEBUG','=','true']]>>>command:[['git','branch'],'|',['wc','-l'],'>>',['out.txt']]

Although we didn’t get AST, but just a bunch of grouped tokens. So now we need to transform it to proper AST:

defprepare_command(command):"""We don't need to work with pyparsing internal data structures,
    so we just convert them to list.
    
    """forpartincommand:ifisinstance(part,str):yieldpartelse:yieldlist(part)defseparator_position(command):"""Find last separator position."""forn,partinenumerate(command[::-1]):ifpartinseparators:returnlen(command)-n-1defcommand_to_ast(command):"""Recursively transform command to AST."""n=separator_position(command)ifnisNone:returntuple(command[0])else:return(command[n],command_to_ast(command[:n]),command_to_ast(command[n+1:]))defto_ast(parsed):ifparsed.env:forenvinparsed.env:yield('=',env[0],env[2])command=list(prepare_command(parsed.command))yieldcommand_to_ast(command)list(to_ast(result))>>>[('=','LANG','C'),>>>('=','DEBUG','true'),>>>('>>',('|',('git','branch'),>>>('wc','-l')),>>>('out.txt',))]

It’s working. The last part, glue that make it easier to use:

defparse(command):result=one_liner.parseString(command)ast=to_ast(result)returnlist(ast)parse('LANG=en_US.utf-8 git diff | wc -l >> diffs')>>>[('=','LANG','en_US.utf-8'),('>>',('|',('git','diff'),('wc','-l')),('diffs',))]

Although it can’t parse all one-liners, it doesn’t support nested commands like:

echo$(gitbranch)echo`git branch`

But it’s enough for my task and support of not implemented features can be added easily.

Gist with source code.


Viewing all articles
Browse latest Browse all 22462

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>