We construct here textp's grammar using instaparse. Our grammar is then constructed here in two parts:
Our lexer is made of regular expression constructed with the [[textp.reader.alpha.grammar/defregex]] macro which uses the Regal library under the covers. We then assemble a lexer from these regular expressions with the [[textp.reader.alpha.grammar/make-lexer]] macro.
For instance we could construct the following 2 rules lexer:
(def-regex number [:* :digit])
(def-regex word [:* [:class ["a" "z"]]])
(def lexer (make-lexer number word))
lexer
;=> {:number {:tag :regexp
:regexp #"\d*"}
:word {:tag :regexp
:regexp #"[a-z]*"}}
We use the [[instaparse.combinators/ebnf]] function to produce grammatical rules. This allows use to write these rules in the ebnf format.
For instance we could write the following:
(def rules
(instac/ebnf
"
doc = (token <':'>)*
token = (number | word)
"))
rules
;=>{:doc {:tag :star
:parser {:tag :cat
:parsers ({:tag :nt :keyword :token}
{:tag :string :string ":" :hide true})}}
:token {:tag :alt
:parsers ({:tag :nt :keyword :number}
{:tag :nt :keyword :word})}}
This way of writing the grammatical rules is way easier than using function combinators and still gives us these rules in map form.
Now that we have both a lexer and and grammatical rules, we can simply merge them to have the full grammar.
We actually get a instparse parser this way:
(def parser
(insta/parser (merge lexer rules)
:start :doc))
(parser "abc:1:def:2:3:")
;=> [:doc
[:token [:word "abc"]]
[:token [:number "1"]]
[:token [:word "def"]]
[:token [:number "2"]]
[:token [:number "3"]]]
```
With the exception of some details, this is how this namespace is made.
# Textp's grammar. We construct here textp's grammar using instaparse. Our grammar is then constructed here in two parts: - a lexical part or lexer made of regular expressions. - a set of grammatical rules tyring the lexer together into the grammar. ## The lexer. Our lexer is made of regular expression constructed with the [[textp.reader.alpha.grammar/defregex]] macro which uses the Regal library under the covers. We then assemble a lexer from these regular expressions with the [[textp.reader.alpha.grammar/make-lexer]] macro. For instance we could construct the following 2 rules lexer: ```clojure (def-regex number [:* :digit]) (def-regex word [:* [:class ["a" "z"]]]) (def lexer (make-lexer number word)) lexer ;=> {:number {:tag :regexp :regexp #"\d*"} :word {:tag :regexp :regexp #"[a-z]*"}} ``` ## The grammatical rules We use the [[instaparse.combinators/ebnf]] function to produce grammatical rules. This allows use to write these rules in the ebnf format. For instance we could write the following: ```clojure (def rules (instac/ebnf " doc = (token <':'>)* token = (number | word) ")) rules ;=>{:doc {:tag :star :parser {:tag :cat :parsers ({:tag :nt :keyword :token} {:tag :string :string ":" :hide true})}} :token {:tag :alt :parsers ({:tag :nt :keyword :number} {:tag :nt :keyword :word})}} ``` This way of writing the grammatical rules is way easier than using function combinators and still gives us these rules in map form. ## The combining trick Now that we have both a lexer and and grammatical rules, we can simply merge them to have the full grammar. We actually get a instparse parser this way: ````clojure (def parser (insta/parser (merge lexer rules) :start :doc)) (parser "abc:1:def:2:3:") ;=> [:doc [:token [:word "abc"]] [:token [:number "1"]] [:token [:word "def"]] [:token [:number "2"]] [:token [:number "3"]]] ``` With the exception of some details, this is how this namespace is made.
Merging of the lexer rules and the grammatical rules.
Merging of the lexer rules and the grammatical rules.
Regex that recognizes any character.
Regex that recognizes any character.
Grammatical rule for commented text:
This text is normal text.
◊/The text here is kept commented out/◊
Grammatical rule for commented text: ```text This text is normal text. ◊/The text here is kept commented out/◊ ```
(def-regex n xeger-expr)
(def-regex n doc xeger-expr)
Macro used to short hand:
(def a-regex (make-regex "a regal expression"))
into
(def-regex a-regex "a regal expression")
Macro used to short hand: ```clojure (def a-regex (make-regex "a regal expression")) ``` into ```clojure (def-regex a-regex "a regal expression") ```
Grammatical rules descripbing clojure code embedded in text.
We can embed clojure calls: ◊(def ex 1)◊ and clojure values ◊|x|◊
Not that the embedded call syntax is mutually recursive with the tag syntax.
We can have :
◊(def home ◊a[:href "www.home.com"]{Home})◊
and use it here: ◊|home|◊
Grammatical rules descripbing clojure code embedded in text. ```text We can embed clojure calls: ◊(def ex 1)◊ and clojure values ◊|x|◊ Not that the embedded call syntax is mutually recursive with the tag syntax. We can have : ◊(def home ◊a[:href "www.home.com"]{Home})◊ and use it here: ◊|home|◊ ```
The backslash used to escaped characters in plain text.
The backslash used to escaped characters in plain text.
Final grammar with all the rules that need to be hidden specified as such.
Final grammar with all the rules that need to be hidden specified as such.
The set of the rule names that need to be hidden. These rules won't produce nodes in the parse tree. In compiler parlance these are the node you'd find in a the syntax tree but not in the abstract syntax tree.
The set of the rule names that need to be hidden. These rules won't produce nodes in the parse tree. In compiler parlance these are the node you'd find in a the syntax tree but not in the abstract syntax tree.
(hide-all g)
Hide all rules in a instaparse grammar in its data (map) form by applying [[instaparse.combinators/hide-tag]] to all values of the map.
Hide all rules in a instaparse grammar in its data (map) form by applying [[instaparse.combinators/hide-tag]] to all values of the map.
(hide-rules g rule-names)
Selectively hide rules instaparse grammar in its data (map) form. It
applies [[instaparse.combinators/hide-tag]] to the rules whose names are in rule-names
.
Selectively hide rules instaparse grammar in its data (map) form. It applies [[instaparse.combinators/hide-tag]] to the rules whose names are in `rule-names`.
Lexer of our grammar. Its the raw lexer with all rules are hidden by default (they won't materialize as a node of a parse tree).
Lexer of our grammar. Its the raw lexer with all rules are hidden by default (they won't materialize as a node of a parse tree).
Raw lexer of our grammar. It's an instaparse grammar in data (map) form containing all the regular expressions used in the final parser.
Raw lexer of our grammar. It's an instaparse grammar in data (map) form containing all the regular expressions used in the final parser.
(make-complex-symbol-regex rep)
Regex for a full symbol name with namespace. Parse an optional ns name followed by
the character /
then a simple symbol. The repetition for the character of the symbol name
is parameterized to allow fo reluctant repetition.
Regex for a full symbol name with namespace. Parse an optional ns name followed by the character `/` then a simple symbol. The repetition for the character of the symbol name is parameterized to allow fo reluctant repetition.
(make-lexer & regexes)
Make a sequence of named regular expression into a intaparse map of named regex rules.
Make a sequence of named regular expression into a intaparse map of named regex rules.
(make-simple-symbol-regex rep)
Regex for simple symbols without namespaces. The character repetition is parameterized to allow for reluctant repetition.
Regex for simple symbols without namespaces. The character repetition is parameterized to allow for reluctant repetition.
Our parser with the starting rule specified as the :doc
rule and the output format tree
set to :enlive
.
Our parser with the starting rule specified as the `:doc` rule and the output format tree set to `:enlive`.
Text to be interpreted as plain text, neither clojure code, nor special blocks of text. Basically any character excluding diamond and backslash which have special meaning.
Text to be interpreted as plain text, neither clojure code, nor special blocks of text. Basically any character excluding diamond and backslash which have special meaning.
In the case of the first character of a symbol name, there are more forbidden chars:
#
isn't allowed either.In the case of the first character of a symbol name, there are more forbidden chars: - digits aren't allowed as first character - the macro reader char `#` isn't allowed either.
Regex for the ns name of a symbol, parses dot separated names until a final name.
Regex for the ns name of a symbol, parses dot separated names until a final name.
Characters that are always forbidden in symbol names:
/
since it the special meaning of separating the namespace from the symbol name..
since it has the special meaning of separating symbol names.\
since it is reserved by clojure to identify a literal character.Characters that are always forbidden in symbol names: - spaces - diamond char since it starts another grammatical rule - delimitors: parens, brackets, braces and double quotes. - `/` since it the special meaning of separating the namespace from the symbol name. - `.` since it has the special meaning of separating symbol names. - `\` since it is reserved by clojure to identify a literal character.
Grammatical rules for tag syntax.
A tag is meant to ultimately be a clojure call. It starts with the character ◊ followed by a symbol then followed by arguments. Arguments can be clojure arguments enclosed in brackets or text argument enclosed in braces.
Clojure arguments allow clojure code to be passed argument as embedded code which can contain other tags. Text argument are block of text which can recursively contain tags and embedded code.
Grammatical rules for tag syntax. A tag is meant to ultimately be a clojure call. It starts with the character ◊ followed by a symbol then followed by arguments. Arguments can be clojure arguments enclosed in brackets or text argument enclosed in braces. Clojure arguments allow clojure code to be passed argument as embedded code which can contain other tags. Text argument are block of text which can recursively contain tags and embedded code.
The text found inside curly braces in tags. Can be anything but the chars:
◊
: diamond will start a new grammatical rule}
: right curly brace closes the text arg to the tagAllows for the forbidden chars to appear when escaped with a backslash.
The text found inside curly braces in tags. Can be anything but the chars: - `◊`: diamond will start a new grammatical rule - `}`: right curly brace closes the text arg to the tag - '\' : backslash with start an escaped char grammatical rule Allows for the forbidden chars to appear when escaped with a backslash.
Regex used to parse text inside a comment block. All characters allowed terminated by "/◊"
Regex used to parse text inside a comment block. All characters allowed terminated by "/◊"
Regex used when parsing parsing text in embedded code.
Regex used when parsing parsing text in embedded code.
Regex used when parsing a symbol in the case of embedded values. It basically
is the same as text-symbol
except for the use of reluctant repetition for the
symbol name and the use of a lookahead at the end to search for the end of an embedded
value block.
Regex used when parsing a symbol in the case of embedded values. It basically is the same as `text-symbol` except for the use of reluctant repetition for the symbol name and the use of a lookahead at the end to search for the end of an embedded value block.
Grammatical rules for top level text. Basically any character except "◊" or any escaped character.
Grammatical rules for top level text. Basically any character except "◊" or any escaped character.
Spaces found inbetween tag args.
Spaces found inbetween tag args.
Regex used when parsing a symbol in the case of tag names.
Regex used when parsing a symbol in the case of tag names.
Regex used when parsing a the text inside a clojure argument to a tag. Can be anything but the chars:
◊
: diamond will start a new grammatical rule[]
, brackets: theses characters will start a new grammatical rule"
: double quote will start a new grammatical ruleAllows for the forbidden char to appear when escaped with a backslash.
Regex used when parsing a the text inside a clojure argument to a tag. Can be anything but the chars: - `◊`: diamond will start a new grammatical rule - `[]`, brackets: theses characters will start a new grammatical rule - `"`: double quote will start a new grammatical rule Allows for the forbidden char to appear when escaped with a backslash.
The text inside a clojure string. Can be anything but the char:
"
: double quote will close the stringAllows for the forbidden chars to appear when escaped with a backslash.
The text inside a clojure string. Can be anything but the char: - `"`: double quote will close the string Allows for the forbidden chars to appear when escaped with a backslash.
Regex used to parse text inside a verbatim block. All characters allowed terminated by "!◊"
Regex used to parse text inside a verbatim block. All characters allowed terminated by "!◊"
Grammatical rule for verbatim text:
This text is normal text.
◊!The text here is kept ◊verbatim!◊
Grammatical rule for verbatim text: ```text This text is normal text. ◊!The text here is kept ◊verbatim!◊ ```
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close