This namespaces provides a reader that combines our grammar and clojure's reader to turn a string of text into data clojure can then evaluate.
The reader starts by parsing the text using our grammar then returns a clojurized version of the parse tree.
The different syntactic elements are processed as follows:
The reader can wrap comment/embedded clojure in maps if indicated to. These maps have 2 keys:
type
: a marker explaining the kind of special value the map representsdata
: the actual value being wrapped, the content of a comment or the embedded clojure code.This model is consistent with the way https://github.com/cgrand/enlive treats dtd elements for instance. This may allow for uniform processing when generating html for instance.
This namespaces provides a reader that combines our grammar and clojure's reader to turn a string of text into data clojure can then evaluate. ## Reader results The reader starts by parsing the text using our grammar then returns a *clojurized* version of the parse tree. The different syntactic elements are processed as follows: - text -> string - tag -> clojure fn call - verbatim block -> string containing the verbatim block's content. - comments -> empty string or special map containing the comment depending on [[textp.reader.alpha.core/*keep-comments*]] - embedded clojure -> drop in clojure code or a map containing the code depending on [[textp.reader.alpha.core/*wrap-embedded*]] ## Special maps The reader can wrap comment/embedded clojure in maps if indicated to. These maps have 2 keys: - `type`: a marker explaining the kind of special value the map represents - `data`: the actual value being wrapped, the content of a comment or the embedded clojure code. This model is consistent with the way [https://github.com/cgrand/enlive](enlive) treats dtd elements for instance. This may allow for uniform processing when generating html for instance.
We construct here textp's grammar using instaparse. Our grammar is then constructed here in two parts:
Our lexer is made of regular expression constructed with the [[textp.reader.alpha.grammar/defregex]] macro which uses the Regal library under the covers. We then assemble a lexer from these regular expressions with the [[textp.reader.alpha.grammar/make-lexer]] macro.
For instance we could construct the following 2 rules lexer:
(def-regex number [:* :digit])
(def-regex word [:* [:class ["a" "z"]]])
(def lexer (make-lexer number word))
lexer
;=> {:number {:tag :regexp
:regexp #"\d*"}
:word {:tag :regexp
:regexp #"[a-z]*"}}
We use the [[instaparse.combinators/ebnf]] function to produce grammatical rules. This allows use to write these rules in the ebnf format.
For instance we could write the following:
(def rules
(instac/ebnf
"
doc = (token <':'>)*
token = (number | word)
"))
rules
;=>{:doc {:tag :star
:parser {:tag :cat
:parsers ({:tag :nt :keyword :token}
{:tag :string :string ":" :hide true})}}
:token {:tag :alt
:parsers ({:tag :nt :keyword :number}
{:tag :nt :keyword :word})}}
This way of writing the grammatical rules is way easier than using function combinators and still gives us these rules in map form.
Now that we have both a lexer and and grammatical rules, we can simply merge them to have the full grammar.
We actually get a instparse parser this way:
(def parser
(insta/parser (merge lexer rules)
:start :doc))
(parser "abc:1:def:2:3:")
;=> [:doc
[:token [:word "abc"]]
[:token [:number "1"]]
[:token [:word "def"]]
[:token [:number "2"]]
[:token [:number "3"]]]
```
With the exception of some details, this is how this namespace is made.
# Textp's grammar. We construct here textp's grammar using instaparse. Our grammar is then constructed here in two parts: - a lexical part or lexer made of regular expressions. - a set of grammatical rules tyring the lexer together into the grammar. ## The lexer. Our lexer is made of regular expression constructed with the [[textp.reader.alpha.grammar/defregex]] macro which uses the Regal library under the covers. We then assemble a lexer from these regular expressions with the [[textp.reader.alpha.grammar/make-lexer]] macro. For instance we could construct the following 2 rules lexer: ```clojure (def-regex number [:* :digit]) (def-regex word [:* [:class ["a" "z"]]]) (def lexer (make-lexer number word)) lexer ;=> {:number {:tag :regexp :regexp #"\d*"} :word {:tag :regexp :regexp #"[a-z]*"}} ``` ## The grammatical rules We use the [[instaparse.combinators/ebnf]] function to produce grammatical rules. This allows use to write these rules in the ebnf format. For instance we could write the following: ```clojure (def rules (instac/ebnf " doc = (token <':'>)* token = (number | word) ")) rules ;=>{:doc {:tag :star :parser {:tag :cat :parsers ({:tag :nt :keyword :token} {:tag :string :string ":" :hide true})}} :token {:tag :alt :parsers ({:tag :nt :keyword :number} {:tag :nt :keyword :word})}} ``` This way of writing the grammatical rules is way easier than using function combinators and still gives us these rules in map form. ## The combining trick Now that we have both a lexer and and grammatical rules, we can simply merge them to have the full grammar. We actually get a instparse parser this way: ````clojure (def parser (insta/parser (merge lexer rules) :start :doc)) (parser "abc:1:def:2:3:") ;=> [:doc [:token [:word "abc"]] [:token [:number "1"]] [:token [:word "def"]] [:token [:number "2"]] [:token [:number "3"]]] ``` With the exception of some details, this is how this namespace is made.
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close