The pyparsing module is an alternative approach to creating and
executing simple grammars, vs. the traditional lex/yacc approach, or the
use of regular expressions. With pyparsing, you don't need to learn a
new syntax for defining grammars or matching expressions - the parsing
module provides a library of classes that you use to construct the
grammar directly in Python.
Here is a program to parse "Hello, World!" (or any greeting
of the form "<salutation>, <addressee>!"):
The Python representation of the grammar is quite readable, owing to
the self-explanatory class names, and the use of '+', '|' and '^'
operators.
The parsed results returned from parseString() can be accessed as a
nested list, a dictionary, or an object with named attributes.
The pyparsing module handles some of the problems that are typically
vexing when writing text parsers:
|
ParseBaseException
base exception class for all parsing runtime exceptions
|
|
ParseException
exception thrown when parse expressions don't match class
|
|
ParseFatalException
user-throwable exception thrown when inconsistent parse content is
found; stops all parsing immediately
|
|
RecursiveGrammarException
exception thrown by validate() if the grammar could be improperly
recursive
|
|
ParseResults
Structured parse results, to provide multiple means of access to
the parsed data:
|
|
ParserElement
Abstract base level parser element class.
|
|
Token
Abstract ParserElement subclass, for defining atomic matching
patterns.
|
|
Empty
An empty token, will always match.
|
|
NoMatch
A token that will never match.
|
|
Literal
Token to exactly match a specified string.
|
|
Keyword
Token to exactly match a specified string as a keyword, that is, it
must be immediately followed by a non-keyword character.
|
|
CaselessLiteral
Token to match a specified string, ignoring case of letters.
|
|
Word
Token for matching words composed of allowed character sets.
|
|
CharsNotIn
Token for matching words composed of characters *not* in a given
set.
|
|
White
Special matching class for matching whitespace.
|
|
PositionToken
|
|
GoToColumn
Token to advance to a specific column of input text; useful for
tabular report scraping.
|
|
LineStart
Matches if current position is at the beginning of a line within
the parse string
|
|
LineEnd
Matches if current position is at the end of a line within the
parse string
|
|
StringStart
Matches if current position is at the beginning of the parse string
|
|
StringEnd
Matches if current position is at the end of the parse string
|
|
ParseExpression
Abstract subclass of ParserElement, for combining and
post-processing parsed tokens.
|
|
And
Requires all given ParseExpressions to be found in the given order.
|
|
Or
Requires that at least one ParseExpression is found.
|
|
MatchFirst
Requires that at least one ParseExpression is found.
|
|
Each
Requires all given ParseExpressions to be found, but in any order.
|
|
ParseElementEnhance
Abstract subclass of ParserElement, for combining and
post-processing parsed tokens.
|
|
FollowedBy
Lookahead matching of the given parse expression.
|
|
NotAny
Lookahead to disallow matching with the given parse expression.
|
|
ZeroOrMore
Optional repetition of zero or more of the given expression.
|
|
OneOrMore
Repetition of one or more of the given expression.
|
|
Optional
Optional matching of the given expression.
|
|
SkipTo
Token for skipping over all undefined text until the matched
expression is found.
|
|
Forward
Forward declaration of an expression to be defined later - used for
recursive grammars, such as algebraic infix notation.
|
|
_ForwardNoRecurse
|
|
TokenConverter
Abstract subclass of ParseExpression, for converting parsed
results.
|
|
Upcase
Converter to upper case all matching tokens.
|
|
Combine
Converter to concatenate all matching tokens to a single string.
|
|
Group
Converter to return the matched tokens as a list - useful for
returning tokens of ZeroOrMore and OneOrMore expressions.
|
|
Dict
Converter to return a repetitive expression as a list, but also as
a dictionary.
|
|
Suppress
Converter for ignoring the results of a parsed expression.
|
|
_ustr(obj)
Drop-in replacement for str(obj) that tries to be Unicode friendly. |
source code
|
|
|
|
|
col(loc,
strg)
Returns current column within a string, counting newlines as line
separators The first column is number 1. |
source code
|
|
|
lineno(loc,
strg)
Returns current line number within a string, counting newlines as
line separators The first line is number 1. |
source code
|
|
|
line(loc,
strg)
Returns the line of text containing loc within a string, counting
newlines as line separators The first line is number 1. |
source code
|
|
|
_defaultStartDebugAction(instring,
loc,
expr) |
source code
|
|
|
_defaultSuccessDebugAction(instring,
startloc,
endloc,
expr,
toks) |
source code
|
|
|
_defaultExceptionDebugAction(instring,
loc,
expr,
exc) |
source code
|
|
|
nullDebugAction(*args)
'Do-nothing' debug action, to suppress debugging output during
parsing. |
source code
|
|
|
delimitedList(expr,
delim=' , ' ,
combine=False)
Helper to define a delimited list of expressions - the delimiter
defaults to ','. |
source code
|
|
|
oneOf(strs,
caseless=False)
Helper to quickly define a set of alternative Literals, and makes
sure to do longest-first testing when there is a conflict, regardless
of the input order, but returns a MatchFirst for best performance. |
source code
|
|
|
dictOf(key,
value)
Helper to easily and clearly define a dictionary by specifying the
respective patterns for the key and value. |
source code
|
|
|
|
|
srange(s)
Helper to easily define string ranges for use in Word construction. |
source code
|
|
|
|
|
|
|
upcaseTokens(s,
l,
t)
Helper parse action to convert tokens to upper case. |
source code
|
|
|
downcaseTokens(s,
l,
t)
Helper parse action to convert tokens to lower case. |
source code
|
|
|
_makeTags(tagStr,
xml)
Internal helper to construct opening and closing tag expressions,
given a tag name |
source code
|
|
|
makeHTMLTags(tagStr)
Helper to construct opening and closing tag expressions for HTML,
given a tag name |
source code
|
|
|
makeXMLTags(tagStr)
Helper to construct opening and closing tag expressions for XML,
given a tag name |
source code
|
|
|
__doc__ = "...
|
|
__versionTime__ = ' 12 September 2005 22:50 '
|
|
alphas = ' abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ '
|
|
nums = ' 0123456789 '
|
|
hexnums = ' 0123456789ABCDEFabcdef '
|
|
alphanums = ' abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVW ...
|
|
_bslash = ' \\ '
|
|
printables = ' 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKL ...
|
|
empty = empty
|
|
_escapedPunc = W:(\,\[]-...)
|
|
_printables_less_backslash = ' 0123456789abcdefghijklmnopqrstuv ...
|
|
_escapedHexChar = Combine:({Suppress:("\0x") W:(0123...)})
|
|
_escapedOctChar = Combine:({Suppress:("\") W:(0,0123...)})
|
|
_singleChar = {W:(\,\[]-...) | Combine:({Suppress:("\0x") W:(0...
|
|
_charRange = Group:({{W:(\,\[]-...) | Combine:({Suppress:("\0x...
|
|
_reBracketExpr = {"[" ["^"] Group:({{Group:({{W:(\,\[]-...) | ...
|
|
alphas8bit = u' ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîï ...
|
|
_escapables = ' tnrfbacdeghijklmopqsuvwxyz \\\'" '
|
|
_octDigits = ' 01234567 '
|
|
_escapedChar = {W:(\,tnrf...) | W:(\,0123...)}
|
|
_sglQuote = "'"
|
|
_dblQuote = """
|
|
dblQuotedString = string enclosed in double quotes
|
|
sglQuotedString = string enclosed in single quotes
|
|
quotedString = quotedString using single or double quotes
|
|
cStyleComment = cStyleComment enclosed in /* ... */
|
|
htmlComment = htmlComment enclosed in <!-- ... -->
|
|
restOfLine = rest of line up to \n
|
|
dblSlashComment = {"//" rest of line up to \n}
|
|
cppStyleComment = {FollowedBy:("/") {{"//" rest of line up to ...
|
|
javaStyleComment = {FollowedBy:("/") {{"//" rest of line up to...
|
|
pythonStyleComment = {"#" rest of line up to \n}
|
|
_noncomma = ' 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLM ...
|
|
_commasepitem = commaItem
|
|
commaSeparatedList = commaSeparatedList
|
|
__package__ = ' spade '
|
|
c = ' ~ '
|