Liking cljdoc? Tell your friends :D

crustimoney.combinators

Parsers combinator functions.

Each combinator functions creates a parser function that is suitable for use with core's main parse function, and many take other parser functions as their argument; they are composable.

If you want to implement your own parser combinator, read on. Otherwise, just look at the docstrings of the combinators themselves.

The parsers returned by the combinators do not call other parsers directly, as this could lead to stack overflows. So next to a ->success or ->error result, it can also return a ->push result. This pushes another parser onto the virtual stack.

For this reason, a parser function has the following signature:

(fn
  ([text index]
    ...)
  ([text index result state]
   ...))

The 2-arity variant is called when the parser was pushed onto the stack. It receives the entire text and the index it should begin parsing. If it returns a push result, the 4-arity variant is called when that parser is done. It again receives the text and the original index, but also the result of the pushed parser and any state that was pushed with it.

Both arities can return a success, a set of errors, or a push. The crustimoney.results namespace should be used for creating and reading these results.

Before you write your own combinator, do realise that the provided combinators are complete in the sense that they can parse any text.

Parsers combinator functions.

Each combinator functions creates a parser function that is suitable
for use with core's main parse function, and many take other parser
functions as their argument; they are composable.

If you want to implement your own parser combinator, read on.
Otherwise, just look at the docstrings of the combinators
themselves.

The parsers returned by the combinators do not call other parsers
directly, as this could lead to stack overflows. So next to a
`->success` or `->error` result, it can also return a `->push`
result. This pushes another parser onto the virtual stack.

For this reason, a parser function has the following signature:

    (fn
      ([text index]
        ...)
      ([text index result state]
       ...))

The 2-arity variant is called when the parser was pushed onto the
stack. It receives the entire text and the index it should begin
parsing. If it returns a `push` result, the 4-arity variant is
called when that parser is done. It again receives the text and the
original index, but also the result of the pushed parser and any
state that was pushed with it.

Both arities can return a success, a set of errors, or a push. The
`crustimoney.results` namespace should be used for creating and
reading these results.

Before you write your own combinator, do realise that the provided
combinators are complete in the sense that they can parse any text.
raw docstring

chainclj

(chain & parsers)

Chain multiple consecutive parsers.

The chain combinator supports cuts. At least one normal parser must precede a cut. That parser must consume input, which no other parser (via a choice) up in the combinator tree could also consume at that point.

Two kinds of cuts are supported. A "hard" cut and a "soft" cut, which can be inserted in the chain using :hard-cut or :soft-cut. Both types of cuts improve error messages, as they limit backtracking.

With a hard cut, the parser is instructed to never backtrack before the end of this chain. A well placed hard cut has a major benefit, next to better error messages. It allows for substantial memory optimization, since the packrat caches can evict everything before the cut. This can turn memory requirements from O(n) to O(1). Since PEG parsers are memory hungry, this can be a big deal.

With a soft cut, backtracking can still happen outside the chain, but errors will not escape inside the chain after a soft cut. The advantage of a soft cut over a hard cut, is that they can be used at more places without breaking the grammar.

For example, the following parser benefits from a soft-cut:

(choice (chain (maybe (chain (literal "{")
                             :soft-cut
                             (literal "foo")
                             (literal "}")))
               (literal "bar"))
        (literal "baz")))

When parsing "{foo", it will nicely report that a "}" is missing. Without the soft-cut, it would report that "bar" or "baz" are expected, ignoring the more likely error.

When parsing "{foo}eve", it will nicely report that "bar" or "baz" is missing. Placing a hard cut would only report "bar" missing, as it would never backtrack to try the "baz" choice.

Soft cuts do not influence the packrat caches, so they do not help performance wise. A hard cut is implicitly also a soft cut.

Chain multiple consecutive parsers.

The chain combinator supports cuts. At least one normal parser must
precede a cut. That parser must consume input, which no other
parser (via a choice) up in the combinator tree could also consume
at that point.

Two kinds of cuts are supported. A "hard" cut and a "soft" cut,
which can be inserted in the chain using `:hard-cut` or `:soft-cut`.
Both types of cuts improve error messages, as they limit
backtracking.

With a hard cut, the parser is instructed to never backtrack before
the end of this chain. A well placed hard cut has a major benefit,
next to better error messages. It allows for substantial memory
optimization, since the packrat caches can evict everything before
the cut. This can turn memory requirements from O(n) to O(1). Since
PEG parsers are memory hungry, this can be a big deal.

With a soft cut, backtracking can still happen outside the chain,
but errors will not escape inside the chain after a soft cut. The
advantage of a soft cut over a hard cut, is that they can be used at
more places without breaking the grammar.

For example, the following parser benefits from a soft-cut:

    (choice (chain (maybe (chain (literal "{")
                                 :soft-cut
                                 (literal "foo")
                                 (literal "}")))
                   (literal "bar"))
            (literal "baz")))

When parsing "{foo", it will nicely report that a "}" is
missing. Without the soft-cut, it would report that "bar" or
"baz" are expected, ignoring the more likely error.

When parsing "{foo}eve", it will nicely report that "bar" or
"baz" is missing. Placing a hard cut would only report "bar"
missing, as it would never backtrack to try the "baz" choice.

Soft cuts do not influence the packrat caches, so they do not help
performance wise. A hard cut is implicitly also a soft cut.
sourceraw docstring

choiceclj

(choice & parsers)

Match the first of the ordered parsers that is successful.

Match the first of the ordered parsers that is successful.
sourceraw docstring

eofclj

(eof)

Succeed only if the entire text has been parsed.

Succeed only if the entire text has been parsed.
sourceraw docstring

grammarcljmacro

(grammar m)

Takes (something that evaluates to) a map, in which the entries can refer to each other using the ref function. In other words, a recursive map. For example:

(grammar {:foo  (literal "foo")
          :root (chain (ref :foo) "bar")})

A rule's name key can be postfixed with =. The rule's parser is then wrapped with with-name (without the postfix). A ref to such rule is also without the postfix.

However, it is encouraged to be very intentional about which nodes should be captured and when. For example, the following (string) grammar ensures that the :prefixed node is only in the result when applicable.

root=    <- prefixed (' ' prefixed)*
prefixed <- (:prefixed '!' body) / body
body=    <- [a-z]+

Parsing "foo !bar" would result in the following result tree:

[:root {:start 0, :end 8}
 [:body {:start 0, :end 3}]
 [:prefixed {:start 4, :end 8}
  [:body {:start 5, :end 8}]]]
Takes (something that evaluates to) a map, in which the entries can
refer to each other using the `ref` function. In other words, a
recursive map. For example:

    (grammar {:foo  (literal "foo")
              :root (chain (ref :foo) "bar")})

A rule's name key can be postfixed with `=`. The rule's parser is
then wrapped with `with-name` (without the postfix). A `ref` to such
rule is also without the postfix.

However, it is encouraged to be very intentional about which nodes
should be captured and when. For example, the following (string)
grammar ensures that the `:prefixed` node is only in the result when
applicable.

    root=    <- prefixed (' ' prefixed)*
    prefixed <- (:prefixed '!' body) / body
    body=    <- [a-z]+

Parsing "foo !bar" would result in the following result tree:

    [:root {:start 0, :end 8}
     [:body {:start 0, :end 3}]
     [:prefixed {:start 4, :end 8}
      [:body {:start 5, :end 8}]]]
sourceraw docstring

literalclj

(literal s)

A parser that matches an exact literal string.

A parser that matches an exact literal string.
sourceraw docstring

lookaheadclj

(lookahead parser)

Lookahead for the given parser, i.e. succeed if the parser does, without advancing the parsing position.

Lookahead for the given parser, i.e. succeed if the parser does,
without advancing the parsing position.
sourceraw docstring

maybeclj

(maybe parser)

Try to parse the given parser, but succeed anyway.

Try to parse the given parser, but succeed anyway.
sourceraw docstring

negateclj

(negate parser)

Negative lookahead for the given parser, i.e. this succeeds if the parser does not.

Negative lookahead for the given parser, i.e. this succeeds if the
parser does not.
sourceraw docstring

refclj

(ref key)

Wrap another parser function, which is referred to by the given key. Needs to be called within the lexical scope of grammar.

Wrap another parser function, which is referred to by the given key.
Needs to be called within the lexical scope of `grammar`.
sourceraw docstring

regexclj

(regex re)

A parser that matches the given regular expression (string or pattern).

A parser that matches the given regular expression (string or 
pattern).
sourceraw docstring

repeat*clj

(repeat* parser)

Eagerly try to match the given parser as many times as possible.

Eagerly try to match the given parser as many times as possible.
sourceraw docstring

repeat+clj

(repeat+ parser)

Eagerly try to match the parser as many times as possible, expecting at least one match.

Eagerly try to match the parser as many times as possible, expecting
at least one match.
sourceraw docstring

with-errorclj

(with-error key parser)

Wrap the parser, replacing any errors with a single error with the supplied error key.

Wrap the parser, replacing any errors with a single error with the
supplied error key.
sourceraw docstring

with-nameclj

(with-name key parser)

Wrap the parser, assigning a name to the (success) result of the parser. Nameless parsers are filtered out by default during parsing.

Wrap the parser, assigning a name to the (success) result of the
parser. Nameless parsers are filtered out by default during
parsing.
sourceraw docstring

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close