The goal of prose's reader is to translate prose style text into clojure data
that can be evaluated by clojure's eval
function. To do so we make use of the
instaparse and edamame libraries. Instaparse helps us separate text that is text
from text that is code. Edamame is used to read textual code into clojure data.
We explore here the way our reader is constructed.
We use instaparse to express prose's grammar and generate it's parser. Instaparse doesn't make a distinction between lexer and parser. Still it is usefull to separate our grammar using this usual lexer / grammar dichotomy in the code.
Our lexer is made of regular expressions constructed with the
fr.jeremyschoffen.prose.alpha.reader.grammar.utils/def-regex
macro. It uses the Regal library under the covers.
To assemble these regexes into a lexer we use the fr.jeremyschoffen.prose.alpha.reader.grammar.utils/make-lexer
macro.
Using instaparse and some helpers
(require '[instaparse.core :as insta])
(require '[instaparse.combinators :as instac])
(require '[fr.jeremyschoffen.prose.alpha.reader.grammar.utils :as gu])
(require '[clojure.pprint :as pp])
(defn p [s] (-> s pp/pprint with-out-str))
we can construct the following 2 rules lexer:
(gu/def-regex number [:* :digit])
(gu/def-regex word [:* [:class ["a" "z"]]])
(def lexer (gu/make-lexer number word))
(p lexer))
;=>
{:number {:tag :regexp, :regexp #"\d*"},
:word {:tag :regexp, :regexp #"[a-z]*"}}
Note that our lexer is actually a (partial) instaparse grammar.
Most of the grammatical rules are created using the ebnf notation:
(def rules
(instac/ebnf
"
doc = (token <':'>)*
token = (number | word)
"))
(p rules))
;=>
{:doc
{:tag :star,
:parser
{:tag :cat,
:parsers
({:tag :nt, :keyword :token}
{:tag :string, :string ":", :hide true})}},
:token
{:tag :alt,
:parsers ({:tag :nt, :keyword :number} {:tag :nt, :keyword :word})}}
Now that we have both a lexer and a grammar, we can simply merge them to make our parser.
(def parser
(insta/parser (merge lexer rules)
:start :doc))
(p (parser "abc:1:def:2:3:")))
;=>
[:doc
[:token [:word "abc"]]
[:token [:number "1"]]
[:token [:word "def"]]
[:token [:number "2"]]
[:token [:number "3"]]]
Prose's parser generated with instaparse is insufficient to constitute a reader by itself.
We can compare the results of the parser and the reader on an example.
(require '[fr.jeremyschoffen.prose.alpha.reader.core :as reader])
(require '[fr.jeremyschoffen.prose.alpha.reader.grammar :as g])
(require '[fr.jeremyschoffen.prose.alpha.document.lib :as lib])
(def example (lib/slurp-doc "reader/example.prose"))
example
;=>
◊(def a 3)
Some ◊em{example} text: ◊|a◊"."
The parser by itself distinguishes between text and code:
(-> example g/parser p)
;=>
{:tag :doc,
:content
({:tag :clojure-call, :content ("(" "def a 3" ")")}
"\n\nSome "
{:tag :tag,
:content
({:tag :tag-name, :content ("em")}
{:tag :tag-text-arg, :content ("{" "example" "}")})}
" text: "
{:tag :symbol-use, :content ("a")}
"."
"\n")}
The fr.jeremyschoffen.prose.alpha.reader.core/clojurize
function takes the result of the parser and
re-arranges it into evaluable data:
(-> example g/parser reader/clojurize p)
;=>
[(def a 3) "\n\nSome " (em "example") " text: " a "." "\n"]
This reading in 2 phases is provided by the fr.jeremyschoffen.prose.alpha.reader.core/read-from-string
function.
Can you improve this documentation?Edit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close