Parsers combinator functions.
Each combinator functions creates a parser function that is suitable for use with core's main parse function, and many take other parser functions as their argument; they are composable.
If you want to implement your own parser combinator, read on. Otherwise, just look at the docstrings of the combinators themselves.
The parsers returned by the combinators do not call other parsers
directly, as this could lead to stack overflows. So next to a
->success
or ->error
result, it can also return a ->push
result. This pushes another parser onto the virtual stack.
For this reason, a parser function has the following signature:
(fn
([text index]
...)
([text index result state]
...))
The 2-arity variant is called when the parser was pushed onto the
stack. It receives the entire text and the index it should begin
parsing. If it returns a push
result, the 4-arity variant is
called when that parser is done. It again receives the text and the
original index, but also the result of the pushed parser and any
state that was pushed with it.
Both arities can return a success, a set of errors, or a push. The
crustimoney.results
namespace should be used for creating and
reading these results.
Before you write your own combinator, do realise that the provided combinators are complete in the sense that they can parse any text.
Parsers combinator functions. Each combinator functions creates a parser function that is suitable for use with core's main parse function, and many take other parser functions as their argument; they are composable. If you want to implement your own parser combinator, read on. Otherwise, just look at the docstrings of the combinators themselves. The parsers returned by the combinators do not call other parsers directly, as this could lead to stack overflows. So next to a `->success` or `->error` result, it can also return a `->push` result. This pushes another parser onto the virtual stack. For this reason, a parser function has the following signature: (fn ([text index] ...) ([text index result state] ...)) The 2-arity variant is called when the parser was pushed onto the stack. It receives the entire text and the index it should begin parsing. If it returns a `push` result, the 4-arity variant is called when that parser is done. It again receives the text and the original index, but also the result of the pushed parser and any state that was pushed with it. Both arities can return a success, a set of errors, or a push. The `crustimoney.results` namespace should be used for creating and reading these results. Before you write your own combinator, do realise that the provided combinators are complete in the sense that they can parse any text.
(chain & parsers)
Chain multiple consecutive parsers.
The chain combinator supports cuts. At least one normal parser must precede a cut. That parser must consume input, which no other parser (via a choice) up in the combinator tree could also consume at that point.
Two kinds of cuts are supported. A "hard" cut and a "soft" cut,
which can be inserted in the chain using :hard-cut
or :soft-cut
.
Both types of cuts improve error messages, as they limit
backtracking.
With a hard cut, the parser is instructed to never backtrack before the end of this chain. A well placed hard cut has a major benefit, next to better error messages. It allows for substantial memory optimization, since the packrat caches can evict everything before the cut. This can turn memory requirements from O(n) to O(1). Since PEG parsers are memory hungry, this can be a big deal.
With a soft cut, backtracking can still happen outside the chain, but errors will not escape inside the chain after a soft cut. The advantage of a soft cut over a hard cut, is that they can be used at more places without breaking the grammar.
For example, the following parser benefits from a soft-cut:
(choice (chain (maybe (chain (literal "{")
:soft-cut
(literal "foo")
(literal "}")))
(literal "bar"))
(literal "baz")))
When parsing "{foo", it will nicely report that a "}" is missing. Without the soft-cut, it would report that "bar" or "baz" are expected, ignoring the more likely error.
When parsing "{foo}eve", it will nicely report that "bar" or "baz" is missing. Placing a hard cut would only report "bar" missing, as it would never backtrack to try the "baz" choice.
Soft cuts do not influence the packrat caches, so they do not help performance wise. A hard cut is implicitly also a soft cut.
Chain multiple consecutive parsers. The chain combinator supports cuts. At least one normal parser must precede a cut. That parser must consume input, which no other parser (via a choice) up in the combinator tree could also consume at that point. Two kinds of cuts are supported. A "hard" cut and a "soft" cut, which can be inserted in the chain using `:hard-cut` or `:soft-cut`. Both types of cuts improve error messages, as they limit backtracking. With a hard cut, the parser is instructed to never backtrack before the end of this chain. A well placed hard cut has a major benefit, next to better error messages. It allows for substantial memory optimization, since the packrat caches can evict everything before the cut. This can turn memory requirements from O(n) to O(1). Since PEG parsers are memory hungry, this can be a big deal. With a soft cut, backtracking can still happen outside the chain, but errors will not escape inside the chain after a soft cut. The advantage of a soft cut over a hard cut, is that they can be used at more places without breaking the grammar. For example, the following parser benefits from a soft-cut: (choice (chain (maybe (chain (literal "{") :soft-cut (literal "foo") (literal "}"))) (literal "bar")) (literal "baz"))) When parsing "{foo", it will nicely report that a "}" is missing. Without the soft-cut, it would report that "bar" or "baz" are expected, ignoring the more likely error. When parsing "{foo}eve", it will nicely report that "bar" or "baz" is missing. Placing a hard cut would only report "bar" missing, as it would never backtrack to try the "baz" choice. Soft cuts do not influence the packrat caches, so they do not help performance wise. A hard cut is implicitly also a soft cut.
(choice & parsers)
Match the first of the ordered parsers that is successful.
Match the first of the ordered parsers that is successful.
(eof)
Succeed only if the entire text has been parsed.
Succeed only if the entire text has been parsed.
(grammar & maps)
Takes one or more maps, in which the entries can refer to each other
using the ref
function. In other words, a recursive map. For
example:
(grammar {:foo (literal "foo")
:root (chain (ref :foo) "bar")})
A rule's name key can be postfixed with =
. The rule's parser is
then wrapped with with-name
(without the postfix). A ref
to such
rule is also without the postfix.
However, it is encouraged to be very intentional about which nodes
should be captured and when. For example, the following (string)
grammar ensures that the :prefixed
node is only in the result when
applicable.
root= <- prefixed (' ' prefixed)*
prefixed <- (:prefixed '!' body) / body
body= <- [a-z]+
Parsing "foo !bar" would result in the following result tree:
[:root {:start 0, :end 8}
[:body {:start 0, :end 3}]
[:prefixed {:start 4, :end 8}
[:body {:start 5, :end 8}]]]
Takes one or more maps, in which the entries can refer to each other using the `ref` function. In other words, a recursive map. For example: (grammar {:foo (literal "foo") :root (chain (ref :foo) "bar")}) A rule's name key can be postfixed with `=`. The rule's parser is then wrapped with `with-name` (without the postfix). A `ref` to such rule is also without the postfix. However, it is encouraged to be very intentional about which nodes should be captured and when. For example, the following (string) grammar ensures that the `:prefixed` node is only in the result when applicable. root= <- prefixed (' ' prefixed)* prefixed <- (:prefixed '!' body) / body body= <- [a-z]+ Parsing "foo !bar" would result in the following result tree: [:root {:start 0, :end 8} [:body {:start 0, :end 3}] [:prefixed {:start 4, :end 8} [:body {:start 5, :end 8}]]]
(literal s)
A parser that matches an exact literal string.
A parser that matches an exact literal string.
(lookahead parser)
Lookahead for the given parser, i.e. succeed if the parser does, without advancing the parsing position.
Lookahead for the given parser, i.e. succeed if the parser does, without advancing the parsing position.
(maybe parser)
Try to parse the given parser, but succeed anyway.
Try to parse the given parser, but succeed anyway.
(negate parser)
Negative lookahead for the given parser, i.e. this succeeds if the parser does not.
Negative lookahead for the given parser, i.e. this succeeds if the parser does not.
(ref key)
Wrap another parser function, which is referred to by the given key.
Needs to be called within the lexical scope of grammar
.
Wrap another parser function, which is referred to by the given key. Needs to be called within the lexical scope of `grammar`.
(regex re)
A parser that matches the given regular expression (string or pattern).
A parser that matches the given regular expression (string or pattern).
(repeat* parser)
Eagerly try to match the given parser as many times as possible.
Eagerly try to match the given parser as many times as possible.
(repeat+ parser)
Eagerly try to match the parser as many times as possible, expecting at least one match.
Eagerly try to match the parser as many times as possible, expecting at least one match.
(with-error key parser)
Wrap the parser, replacing any errors with a single error with the supplied error key.
Wrap the parser, replacing any errors with a single error with the supplied error key.
(with-name key parser)
Wrap the parser, assigning a name to the (success) result of the parser. Nameless parsers are filtered out by default during parsing.
Wrap the parser, assigning a name to the (success) result of the parser. Nameless parsers are filtered out by default during parsing.
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close