← back to blog index

Simple expression parser in Python

Posted on May 31, 2010

Last year, while working on my Google summer of code project, I had to implement a simple "search engine" which allows to search for tasks in ToDo lists. To search for certain task(s) user can input not only simple plain text queries, but also more complex search expressions, e.g.:

list:Work AND due:today
(list:Personal OR list:Work) AND due:tomorrow
(tag:bananas OR tag:apples) AND priority:1 AND NOT list:Work

And so on. Expressions can be of any complexity.

To parse these expressions a simple configurable parser was implemented in Python. I decided to share the code and write a small blog post about it, so maybe it could be helpful for someone. I called it ExpressionParser, the source code is available at github.

Using this parser you can process input text and convert it to a simple data structure. For example:

>>> query = "NOT list:Work AND due:today"
>>> result = parser.parse(query)
>>> print result
[
    {'type': 'propval', 'value': ('list', 'Work', 'NOT')},
    {'type': 'operator', 'value': 'AND'},
    {'type': 'propval', 'value': ('due', 'today', None)}
]

A bit more complex example:

>>> query = "(list:Personal OR list:Work) AND NOT due:tomorrow"
>>> result = parser.parse(query)
>>> print result
[
    {'type': 'group', 'value': 
        [
            {'type': 'propval', 'value': ('list', 'Personal', None)}, 
            {'type': 'operator', 'value': 'OR'},
            {'type': 'propval', 'value': ('list', 'Work', None)}
        ]
    },
    {'type': 'operator', 'value': 'AND'},
    {'type': 'propval', 'value': ('due', 'tomorrow', 'NOT')}
]

A syntax definition is required for the parser. You need to subclass ExpressionParser and define the syntax like this:

class ExampleExpressionParser(ExpressionParser):
    # syntax definition for the parser
    # ! order is important
    syntax = [
                # group, eg: "(...)"
                { 'start': '\\s*\(',
                  'end': '\)\\s*',
                  'type': 'group'
                },

                # propval, eg: "hasNotes:true"
                # with optional NOT modifier
                { 'start': '\\s*([Nn][Oo][Tt])?\\s*([a-z][a-zA-Z]*): *("(.+?)"|[^ ()]+)\\s*',
                  'type': 'propval',
                  'create': lambda m: (m.group(2), m.group(4) or m.group(3), m.group(1))
                },

                # operator, eg: "AND", "OR"
                { 'start': '\\s*(?i)(AND|OR)\\s*', # case insensitive
                  'type': 'operator',
                  'create': lambda m: m.group(1)
                },

                # text
                # with optional NOT modifier
                # if nothing else matched - match all characters until special
                # character found
                { 'start': '\\s*([Nn][Oo][Tt])?\\s*([ ()]?[^ ()]*)',
                  'type': 'text',
                  'create': lambda m: (m.group(2), m.group(1))
                }
             ]

Then you can use your parser by calling a parse() method:

parser = ExampleExpressionParser()
result = parser.parse(expr)

For more details about using a parser and defining a syntax for it see README file.

Thanks to David Snopek, my mentor from GSoC 2008 for helping me to develop a similar parser which I used as a reference.