Liking cljdoc? Tell your friends :D

fr.jeremyschoffen.textp.alpha.reader.grammar

Textp's grammar.

We construct here textp's grammar using instaparse. Our grammar is then constructed here in two parts:

  • a lexical part or lexer made of regular expressions.
  • a set of grammatical rules tyring the lexer together into the grammar.

The lexer.

Our lexer is made of regular expression constructed with the [[textp.reader.alpha.grammar/defregex]] macro which uses the Regal library under the covers. We then assemble a lexer from these regular expressions with the [[textp.reader.alpha.grammar/make-lexer]] macro.

For instance we could construct the following 2 rules lexer:

(def-regex number [:* :digit])

(def-regex word [:* [:class ["a" "z"]]])

(def lexer (make-lexer number word))

lexer
;=> {:number {:tag :regexp
              :regexp #"\d*"}
     :word {:tag :regexp
            :regexp #"[a-z]*"}}

The grammatical rules

We use the [[instaparse.combinators/ebnf]] function to produce grammatical rules. This allows use to write these rules in the ebnf format.

For instance we could write the following:

(def rules
  (instac/ebnf
     "
     doc = (token <':'>)*
     token = (number | word)
     "))

rules
;=>{:doc {:tag :star
          :parser {:tag :cat
                   :parsers ({:tag :nt :keyword :token}
                            {:tag :string :string ":" :hide true})}}
    :token {:tag :alt
            :parsers ({:tag :nt :keyword :number}
                      {:tag :nt :keyword :word})}}

This way of writing the grammatical rules is way easier than using function combinators and still gives us these rules in map form.

The combining trick

Now that we have both a lexer and and grammatical rules, we can simply merge them to have the full grammar.

We actually get a instparse parser this way:

(def parser
  (insta/parser (merge lexer rules)
                :start :doc))

(parser "abc:1:def:2:3:")
;=> [:doc
     [:token [:word "abc"]]
     [:token [:number "1"]]
     [:token [:word "def"]]
     [:token [:number "2"]]
     [:token [:number "3"]]]
```

With the exception of some details, this is how this namespace is made.
# Textp's grammar.

We construct here textp's grammar using instaparse. Our grammar is then constructed here in two parts:
- a lexical part or lexer made of regular expressions.
- a set of grammatical rules tyring the lexer together into the grammar.

## The lexer.
Our lexer is made of regular expression constructed with the [[textp.reader.alpha.grammar/defregex]] macro
which uses the Regal library under the covers. We then assemble a lexer from these regular expressions
with the [[textp.reader.alpha.grammar/make-lexer]] macro.

For instance we could construct the following 2 rules lexer:

```clojure
(def-regex number [:* :digit])

(def-regex word [:* [:class ["a" "z"]]])

(def lexer (make-lexer number word))

lexer
;=> {:number {:tag :regexp
              :regexp #"\d*"}
     :word {:tag :regexp
            :regexp #"[a-z]*"}}
```

## The grammatical rules
We use the [[instaparse.combinators/ebnf]] function to produce grammatical rules. This allows use
to write these rules in the ebnf format.

For instance we could write the following:
```clojure
(def rules
  (instac/ebnf
     "
     doc = (token <':'>)*
     token = (number | word)
     "))

rules
;=>{:doc {:tag :star
          :parser {:tag :cat
                   :parsers ({:tag :nt :keyword :token}
                            {:tag :string :string ":" :hide true})}}
    :token {:tag :alt
            :parsers ({:tag :nt :keyword :number}
                      {:tag :nt :keyword :word})}}
```

This way of writing the grammatical rules is way easier than using function combinators and still gives us
these rules in map form.

## The combining trick
Now that we have both a lexer and and grammatical rules, we can simply merge them to have the full grammar.

We actually get a instparse parser this way:

````clojure
(def parser
  (insta/parser (merge lexer rules)
                :start :doc))

(parser "abc:1:def:2:3:")
;=> [:doc
     [:token [:word "abc"]]
     [:token [:number "1"]]
     [:token [:word "def"]]
     [:token [:number "2"]]
     [:token [:number "3"]]]
```

With the exception of some details, this is how this namespace is made.
raw docstring

all-delimitorsclj/s

source

all-grammatical-rulesclj/s

Merging of the lexer rules and the grammatical rules.

Merging of the lexer rules and the grammatical rules.
sourceraw docstring

any-charclj/s

Regex that recognizes any character.

Regex that recognizes any character.
sourceraw docstring

anythingclj/s

source

bracesclj/s

source

bracketsclj/s

source

comment-gclj/s

Grammatical rule for commented text:

This text is normal text.
◊/The text here is kept commented out/◊
Grammatical rule for commented text:
```text
This text is normal text.
◊/The text here is kept commented out/◊
```
sourceraw docstring

def-regexclj/smacro

(def-regex n xeger-expr)
(def-regex n doc xeger-expr)

Macro used to short hand:

(def a-regex (make-regex "a regal expression"))

into

(def-regex a-regex "a regal expression")
Macro used to short hand:
```clojure
(def a-regex (make-regex "a regal expression"))
```
into
```clojure
(def-regex a-regex "a regal expression")
```
sourceraw docstring

diamondclj/s

source

double-quoteclj/s

source

embedded-gclj/s

Grammatical rules descripbing clojure code embedded in text.

We can embed clojure calls: ◊(def ex 1)◊ and clojure values ◊|x|◊

Not that the embedded call syntax is mutually recursive with the tag syntax.
We can have :
◊(def home ◊a[:href "www.home.com"]{Home})◊

and use it here: ◊|home|◊
Grammatical rules descripbing clojure code embedded in text.
```text
We can embed clojure calls: ◊(def ex 1)◊ and clojure values ◊|x|◊

Not that the embedded call syntax is mutually recursive with the tag syntax.
We can have :
◊(def home ◊a[:href "www.home.com"]{Home})◊

and use it here: ◊|home|◊
```
sourceraw docstring

embedded-g-maskedclj/s

source

end-commentclj/s

source

end-embedded-valueclj/s

source

end-embeded-codeclj/s

source

end-verbatimclj/s

source

escaperclj/s

source

escaping-charclj/s

The backslash used to escaped characters in plain text.

The backslash used to escaped characters in plain text.
sourceraw docstring

general-gclj/s

source

general-g-maskedclj/s

source

grammarclj/s

Final grammar with all the rules that need to be hidden specified as such.

Final grammar with all the rules that need to be hidden specified as such.
sourceraw docstring

grammar-maskedclj/s

The set of the rule names that need to be hidden. These rules won't produce nodes in the parse tree. In compiler parlance these are the node you'd find in a the syntax tree but not in the abstract syntax tree.

The set of the rule names that need to be hidden. These rules won't
produce nodes in the parse tree. In compiler parlance these are the node you'd find in a the syntax tree but not
in the abstract syntax tree.
sourceraw docstring

hide-allclj/s

(hide-all g)

Hide all rules in a instaparse grammar in its data (map) form by applying [[instaparse.combinators/hide-tag]] to all values of the map.

Hide all rules in a instaparse grammar in its data (map) form by applying
[[instaparse.combinators/hide-tag]] to all values of the map.
sourceraw docstring

hide-rulesclj/s

(hide-rules g rule-names)

Selectively hide rules instaparse grammar in its data (map) form. It applies [[instaparse.combinators/hide-tag]] to the rules whose names are in rule-names.

Selectively hide rules instaparse grammar in its data (map) form. It
applies [[instaparse.combinators/hide-tag]] to the rules whose names are in `rule-names`.
sourceraw docstring

lexerclj/s

Lexer of our grammar. Its the raw lexer with all rules are hidden by default (they won't materialize as a node of a parse tree).

Lexer of our grammar. Its the raw lexer with all rules are hidden by default
(they won't materialize as a node of a parse tree).
sourceraw docstring

lexer*clj/s

Raw lexer of our grammar. It's an instaparse grammar in data (map) form containing all the regular expressions used in the final parser.

Raw lexer of our grammar. It's an instaparse grammar in data (map) form containing all the
regular expressions used in the final parser.
sourceraw docstring

macro-reader-charclj/s

source

make-complex-symbol-regexclj/s

(make-complex-symbol-regex rep)

Regex for a full symbol name with namespace. Parse an optional ns name followed by the character / then a simple symbol. The repetition for the character of the symbol name is parameterized to allow fo reluctant repetition.

Regex for a full symbol name with namespace. Parse an optional ns name followed by
the character `/` then a simple symbol. The repetition for the character of the symbol name
is parameterized to allow fo reluctant repetition.
sourceraw docstring

make-lexerclj/smacro

(make-lexer & regexes)

Make a sequence of named regular expression into a intaparse map of named regex rules.

Make a sequence of named regular expression into a intaparse map of named regex rules.
sourceraw docstring

make-simple-symbol-regexclj/s

(make-simple-symbol-regex rep)

Regex for simple symbols without namespaces. The character repetition is parameterized to allow for reluctant repetition.

Regex for simple symbols without namespaces.
The character repetition is parameterized to allow for reluctant repetition.
sourceraw docstring

non-specialclj/s

source

normal-textclj/s

source

ns-endclj/s

source

parensclj/s

source

parserclj/s

Our parser with the starting rule specified as the :doc rule and the output format tree set to :enlive.

Our parser with the starting rule specified as the `:doc` rule and the output format tree
set to `:enlive`.
sourceraw docstring

plain-textclj/s

Text to be interpreted as plain text, neither clojure code, nor special blocks of text. Basically any character excluding diamond and backslash which have special meaning.

Text to be interpreted as plain text, neither clojure code, nor special blocks of text.
Basically any character excluding diamond and backslash which have special meaning.
sourceraw docstring

symbol-first-charclj/s

In the case of the first character of a symbol name, there are more forbidden chars:

  • digits aren't allowed as first character
  • the macro reader char # isn't allowed either.
In the case of the first character of a symbol name, there are more forbidden chars:
- digits aren't allowed as first character
- the macro reader char `#` isn't allowed either.
sourceraw docstring

symbol-ns-partclj/s

Regex for the ns name of a symbol, parses dot separated names until a final name.

Regex for the ns name of a symbol, parses dot separated names until
a final name.
sourceraw docstring

symbol-regular-charclj/s

Characters that are always forbidden in symbol names:

  • spaces
  • diamond char since it starts another grammatical rule
  • delimitors: parens, brackets, braces and double quotes.
  • / since it the special meaning of separating the namespace from the symbol name.
  • . since it has the special meaning of separating symbol names.
  • \ since it is reserved by clojure to identify a literal character.
Characters that are always forbidden in symbol names:
- spaces
- diamond char since it starts another grammatical rule
- delimitors: parens, brackets, braces  and double quotes.
- `/` since it the special meaning of separating the namespace from the symbol name.
- `.` since it has the special meaning of separating symbol names.
- `\` since it is reserved by clojure to identify a literal character.
sourceraw docstring

symbol-regular-char-setclj/s

source

tag-gclj/s

Grammatical rules for tag syntax.

A tag is meant to ultimately be a clojure call. It starts with the character ◊ followed by a symbol then followed by arguments. Arguments can be clojure arguments enclosed in brackets or text argument enclosed in braces.

Clojure arguments allow clojure code to be passed argument as embedded code which can contain other tags. Text argument are block of text which can recursively contain tags and embedded code.

Grammatical rules for tag syntax.

A tag is meant to ultimately be a clojure call.
It starts with the character ◊ followed by a symbol then followed by arguments.
Arguments can be clojure arguments enclosed in brackets or text argument enclosed in braces.

Clojure arguments allow clojure code to be passed argument as embedded code which can contain other tags.
Text argument are block of text which can recursively contain tags and embedded code.
sourceraw docstring

tag-g-maskedclj/s

source

tag-plain-textclj/s

The text found inside curly braces in tags. Can be anything but the chars:

  • : diamond will start a new grammatical rule
  • }: right curly brace closes the text arg to the tag
  • '' : backslash with start an escaped char grammatical rule

Allows for the forbidden chars to appear when escaped with a backslash.

The text found inside curly braces in tags. Can be anything but the chars:
- `◊`: diamond will start a new grammatical rule
- `}`: right curly brace closes the text arg to the tag
- '\' : backslash with start an escaped char grammatical rule

Allows for the forbidden chars to appear when escaped with a backslash.
sourceraw docstring

text-commentclj/s

Regex used to parse text inside a comment block. All characters allowed terminated by "/◊"

Regex used to parse text inside a comment block.
All characters allowed terminated by "/◊"
sourceraw docstring

text-e-codeclj/s

Regex used when parsing parsing text in embedded code.

Regex used when parsing parsing text in embedded code.
sourceraw docstring

text-e-valueclj/s

Regex used when parsing a symbol in the case of embedded values. It basically is the same as text-symbol except for the use of reluctant repetition for the symbol name and the use of a lookahead at the end to search for the end of an embedded value block.

Regex used when parsing a symbol in the case of embedded values. It basically
is the same as `text-symbol` except for the use of reluctant repetition for the
symbol name and the use of a lookahead at the end to search for the end of an embedded
value block.
sourceraw docstring

text-escaped-charclj/s

source

text-gclj/s

Grammatical rules for top level text. Basically any character except "◊" or any escaped character.

Grammatical rules for top level text. Basically any character except "◊" or any escaped character.
sourceraw docstring

text-g-maskedclj/s

source

text-spacesclj/s

Spaces found inbetween tag args.

Spaces found inbetween tag args.
sourceraw docstring

text-symbolclj/s

Regex used when parsing a symbol in the case of tag names.

Regex used when parsing a symbol in the case of tag names.
sourceraw docstring

text-t-cljclj/s

Regex used when parsing a the text inside a clojure argument to a tag. Can be anything but the chars:

  • : diamond will start a new grammatical rule
  • [], brackets: theses characters will start a new grammatical rule
  • ": double quote will start a new grammatical rule

Allows for the forbidden char to appear when escaped with a backslash.

Regex used when parsing a the text inside a clojure argument to a tag.
Can be anything but the chars:
- `◊`: diamond will start a new grammatical  rule
- `[]`, brackets: theses characters will start a new grammatical rule
- `"`: double quote will start a new grammatical rule

Allows for the forbidden char to appear when escaped with a backslash.
sourceraw docstring

text-t-clj-non-specialclj/s

source

text-t-clj-strclj/s

The text inside a clojure string. Can be anything but the char:

  • ": double quote will close the string

Allows for the forbidden chars to appear when escaped with a backslash.

The text inside a clojure string. Can be anything but the char:
- `"`: double quote will close the string

Allows for the forbidden chars to appear when escaped with a backslash.
sourceraw docstring

text-t-clj-str-non-specialclj/s

source

text-t-txt-non-specialclj/s

source

text-verbatimclj/s

Regex used to parse text inside a verbatim block. All characters allowed terminated by "!◊"

Regex used to parse text inside a verbatim block.
All characters allowed terminated by "!◊"
sourceraw docstring

verbatim-gclj/s

Grammatical rule for verbatim text:

This text is normal text.
◊!The text here is kept ◊verbatim!◊
Grammatical rule for verbatim text:
```text
This text is normal text.
◊!The text here is kept ◊verbatim!◊
```
sourceraw docstring

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close