Driver License Parser Generators
A list of Python parsing tools initially imported from @nedbat'sblog post.
Human-parser-generator - A straightforward recursive descent Parser Generator with a focus on 'human' code generation and ease of use.
The List
Name | Description | License | Updated | Parses | Used By | Notes |
---|---|---|---|---|---|---|
Ply | Docstrings are used to associate lexer or parser rules with actions. The lexer uses Python regular expressions. | LGPL | v3.6 4/2015 | LALR(1) | lesscpy | ply-hack group |
pyparsing | Direct parser objects in python, built to parallel the grammar. | MIT | v2.0.3 8/2014 | twill | ||
ANTLR | Parser and lexical analyzer generator in Java. Generates parsing code in Python (as well as Java, C++, C#, Ruby, etc). | BSD | v4.4 7/2014 | LL(*) | ||
pyPEG | A parsing expression grammar toolkit for Python. | GPL | v 2.15 1/2015 | PEG | ||
pydsl | A language workbench written in Python. | GPLv3 | v 0.5.2 11/2014 | |||
LEPL | A recursive descent parser. | dual licensed MPL/LGPL | v 5.1.3 9/2012 | Discontinued | ||
Codetalker | Python-based grammar definitions. | MIT | v 1.1 3/2014 | |||
funcparserlib | Recurisve descent parsing library for Python based on functional combinators. | MIT | v0.3.6 5/2013 | |||
picoparse | v0.9 3/2009 | |||||
Aperiot | Apache 2.0 | v0.1.12 1/2012 | ||||
PyGgy | Lexes with DFA from a specification in a .pyl file. Parses GLR grammars from a specification in a .pyg file. Rules in both files have Python action code. Unlike most parser generators, the rule actions are all executed in a post-processing step. The parser isn't represented as a discrete object, but as globals in a module. | Public Domain | v0.4 8/2004 | Python 3 compatible fork 0.4.1, discussion group | ||
Parsing | LR(1) parser generator as well as CFSM and GLR parser drivers. | MIT | v1.4 12/2012 | LR(1), CFSM, and GLR | ||
Rparse | GPL | v 1.1.0. 4/2010 | LL(1) parser generator with AST generation. | |||
SableCC | Java-based parser and lexical analyzer generator. Generates parsing code in Java, with alternative generators for other languages including Python. | LGPL | v 3.7 11/2012 | |||
GOLD Parser | zlib/libpng | v 5.2.0 8/2012 | LALR | |||
Plex | Python module for constructing lexical analysers. | LGPL | v 2.0 12/2009 | compiles all of the regular expressions into a single DFA. | ||
Plex3 | Python3 port of Plex | 8/2012 | No official release | |||
yeanpypa | Yeanpypa is (yet another) framework to create recursive-descent parsers in Python. | Public Domain | 4/2010 | Parsers are created by writing an EBNF-like grammar as Python expressions. | ||
ZestyParser | MIT | v 0.8.1 4/2007 | ||||
BisonGen | v 0.8.0b1 4/2005 | |||||
DParser for Python | A scannerless GLR parser | BSD | v 1.3.0 3/2013 | Charming Python: DParser for Python: Exploring Another Python Parser | ||
Yapps | Produces recursive-descent parsers, as a human would write. Designed to be easy to use rather than powerful or fast. Better suited for small parsing tasks like email addresses, simple configuration scripts, etc. | MIT | v 2.1.1 8/2003 | |||
PyBison | Python binding to the Bison (yacc) and Flex (lex) parser-generator utilities | GPL | v 0.1.8 6/2004 | LALR(1) | Doesn't yet support Windows. | |
Yappy | v 1.9.4 8/2014 | SLR, LR(1) and LALR(1) | Uses python strings to declare the grammar. | |||
Toy Parser Generator | LGPL | v 3.2.2 12/2013 | ||||
kwParsing | An experimental parser generator implemented in Python which generates parsers implemented in Python. | Python License | v 1.3 | SLR | Gadfly | |
Martel | Martel uses regular expression format definition to generate a parser for flat- and semi-structured files. The parse tree information is passed back to the caller using XML SAX events. In other words, Martel lets you parse flat files as if they are in XML. | BSD | v 0.8 12/2001 | Last version included in BioPython | ||
SimpleParse | Lexing and parsing in one step, but only deterministic grammars. | BSD | 2.11a2 8/2010 | |||
mxTextTools | An unusual table-based parser. There is no generation step, the parser is hand-coded using primitives provided by the package. The parser is implemented in C for speed. (just above). | eGenix Public License, similar to Python, compatible with GPL. | v 3.2.8 7/2014 | SimpleParse, Martel | ||
SPARK | Uses docstrings to associate productions with actions. Unlike other tools, also includes semantic analysis and code generation phases. | MIT | v 0.7 pre-alpha 7 5/2002 | |||
FlexModule and BisonModule | Macros to allow Flex and Bison to produce idiomatic lexers and parsers for Python. The generated lexers and parsers are in C, compiled into loadable modules. | Pythonesque | v 2.1 3/2002 | |||
Bison in a box | Uses standard Bison to generate pure Python parsers. It actually reads the bison-generated .c file and generates Python code. | GPL | v 0.1.0 6/2001 | LALR(1) | ||
Berkeley YACC | Classic YACC, extended to generate Python code. Python support seems to be undocumented. | Public Domain | v 20141128 11/2014 | LALR(1) | ||
PyLR | Lexer is based on regular expressions. | 12/1997 | LR | |||
PyLR | PyLR is a partial Python implementation of the OpenLR specification | Apache 2.0 | 12/2014 | announcement | ||
Construct | A declarative parser (and builder) for binary data. | BSD | v 2.5.2 4/2014 | |||
ModGrammar | A general-purpose library for constructing language parsers and interpreters for context-free grammar definitions. | BSD | v 0.10 2/2013 | |||
lrparsing | Differs from other Python LR(1) parsers in using Python expressions as grammars, and offers disambiguation tools. | AGPLv3 | v 1.0.11 3/2015 | LR(1) parser and a tokeniser | ||
docopt | Generates a parser based on formalized conventions that are used for help messages and man pages for program interface description. | MIT | v 0.6.2 6/2014 |
Standard Modules
The Python standard library includes a few modules for special-purpose parsing problems. These are not general-purpose parsers, but don't overlook them. If your need overlaps with their capabilities, they're perfect:
- shlex lexes command lines using the rules common to many operating system shells.
- ConfigParser implements a basic configuration file parser language which provides a structure similar to what you would find on Microsoft Windows INI files.
- ArgParse makes it easy to write user-friendly command-line interfaces. The program defines what arguments it requires, and argparse will figure out how to parse those out of sys.argv. The argparse module also automatically generates help and usage messages and issues errors when users give the program invalid arguments.
- email provides many services, including parsing email and other RFC-822 structures.parser parses Python source text.
- cmd implements a simple command interface, prompting for and parsing out command names, then dispatching to your handler methods.
- json is a JSON (JavaScript Object Notation) encoder and decoder
- tokenize is a lexical scanner for Python source code, implemented in Python.
Articles
- Simple Top-Down Parsing in Python - A methodology for writing top-down parsers in Python. (7/2008)
- Pysec: Monadic Combinatoric Parsing in Python - An exposition of using monads to build a Python parser. (2/2008)
Licensing and Attribution
Python Parsing Tools by Michael R. Bernstein is licensed under a Creative Commons Attribution 4.0 International License.
Based on a work at https://github.com/webmaven/python-parsing-tools/.
Parsing
The Parsing module implements an LR(1) parser generator, as well as theruntime support for using a generated parser, via the Lr and Glr parserdrivers. There is no special parser generator input file format, but theparser generator still needs to know what classes/methods correspond tovarious aspects of the parser. This information is specified viadocstrings, which the parser generator introspects in order to generate aparser. Only one parser specification can be embedded in each module, butit is possible to share modules between parser specifications so that, forexample, the same token definitions can be used by multiple parserspecifications.
The parsing tables are LR(1), but they are generated using a fast algorithmthat avoids creating duplicate states that result when using the genericLR(1) algorithm. Creation time and table size are on par with the LALR(1)algorithm. However, LALR(1) can create reduce/reduce conflicts that don'texist in a true LR(1) parser. For more information on the algorithm, see:

Parsing table generation requires non-trivial amounts of time for largegrammars. Internal pickling support makes it possible to cache the mostrecent version of the parsing table on disk, and use the table if thecurrent parser specification is still compatible with the one that was usedto generate the pickled parsing table. Since the compatibility checking isquite fast, even for large grammars, this removes the need to use thestandard code generation method that is used by most parser generators.
Parser specifications are encapsulated by the Spec class. Parser instancesuse Spec instances, but are themselves based on separate classes. Thisallows multiple parser instances to exist simultaneously, without requiringmultiple copies of the parsing tables. There are two separate parser driverclasses:
Java Parser Generator

Generalized LR driver, capable of tracking multiple parse treessimultaneously, if the %split precedence is used to mark ambiguousactions. This driver is closely based on Elkhound's design, whichis described in a technical report:
Parser generator directives are embedded in docstrings, and must begin witha '%' character, followed immediately by one of several keywords:
- Precedence:
%fail
%nonassoc
%left
%right
%split
- Token:
%token
- Non-terminal:
%start
%nonterm
- Production:
%reduce
All of these directives are associated with classes except for %reduce.%reduce is associated with methods within non-terminal classes. The Parsingmodule provides base classes from which precedences, tokens, andnon-terminals must be derived. This is not as restrictive as it sounds,since there is nothing preventing, for example, a master Token class thatsubclasses Parsing.Token, which all of the actual token types then subclass.Also, nothing prevents using multiple inheritance.
Folowing are the base classes to be subclassed by parser specifications:
- Precedence
- Token
- Nonterm
Drivers License Generator Picture
The Parsing module implements the following exception classes:
C# Parser Generator
- SpecError - when there is a problem with the grammar specification
- ParsingException - any problem that occurs during parsing
- UnexpectedToken - when the input sequence contains a token that isnot allowed by the grammar (including end-of-input)
In order to maintain compatibility with legacy code, the Parsing moduledefines the following aliases. New code should use the exceptions abovethat do not shadow Python's builtin exceptions.
- Exception - superclass for all exceptions that can be raised
- SyntaxError - alias for UnexpectedToken
- AttributeError
Author: Jason Evans jasone@canonware.com
Github repo: http://github.com/sprymix/parsing