datahike.pg.classify

Liking cljdoc? Tell your friends :D

Clojure only.

classify
tokenize
tokenize-all

Structural SQL classifier — routes statements to the right handler before JSqlParser sees them.

Kills the regex sprawl across system-query?, parse-sql's CT6 guard, and the arg-extraction regexes (savepoint names, advisory- lock keys, temporal SET values) by feeding those sites a real tokenizer.

The tokenizer handles enough of PG lexical syntax to classify correctly in the face of keyword-inside-a-string, keyword-inside- a-comment, dollar-quoted strings, and case mix. It does NOT parse expressions — it yields a flat token stream for a keyword-dispatch classifier.

Classifier output: {:kind <statement-kind keyword> :name <savepoint/variable name if relevant> :args <vector of literal args — advisory keys, pg_sleep dur> :var <SET/RESET/SHOW variable name> :value <SET value (string) when captured> :reject-kind <opt-out knob key> :tag <synthetic command tag on silent-accept>}

:kind :generic-sql means 'pass to JSqlParser unchanged'.

Non-goals: full PG lexer, expression parsing. The token-driven preprocess-sql rewriter lives in datahike.pg.rewrite and consumes this tokenizer's output.

Structural SQL classifier — routes statements to the right handler
before JSqlParser sees them.

Kills the regex sprawl across `system-query?`, `parse-sql`'s CT6
guard, and the arg-extraction regexes (savepoint names, advisory-
lock keys, temporal SET values) by feeding those sites a real
tokenizer.

The tokenizer handles enough of PG lexical syntax to classify
correctly in the face of keyword-inside-a-string, keyword-inside-
a-comment, dollar-quoted strings, and case mix. It does NOT parse
expressions — it yields a flat token stream for a keyword-dispatch
classifier.

Classifier output:
  {:kind        <statement-kind keyword>
   :name        <savepoint/variable name if relevant>
   :args        <vector of literal args — advisory keys, pg_sleep dur>
   :var         <SET/RESET/SHOW variable name>
   :value       <SET value (string) when captured>
   :reject-kind <opt-out knob key>
   :tag         <synthetic command tag on silent-accept>}

:kind :generic-sql means 'pass to JSqlParser unchanged'.

Non-goals: full PG lexer, expression parsing. The token-driven
preprocess-sql rewriter lives in datahike.pg.rewrite and consumes
this tokenizer's output.

raw docstring

classify^clj

(classify sql)

Classify a SQL string. See namespace docstring for the output contract.

Classify a SQL string. See namespace docstring for the output
contract.

raw docstring

tokenize^clj

(tokenize sql)

Lazy seq of non-comment tokens. Skips whitespace and comments. Stops at end-of-input; does not attempt to recover from mid-string EOF (caller will see :generic-sql and JSqlParser will report the real syntax error).

Lazy seq of non-comment tokens. Skips whitespace and comments.
Stops at end-of-input; does not attempt to recover from mid-string
EOF (caller will see :generic-sql and JSqlParser will report the
real syntax error).

raw docstring

tokenize-all^clj

(tokenize-all sql)

Lazy seq of tokens INCLUDING :comment tokens (line and block). Useful for source-rewriting callers that need to know comment spans so they don't excise into or across a commented region. Classification itself uses tokenize, which filters comments out for speed.

Lazy seq of tokens INCLUDING :comment tokens (line and block).
Useful for source-rewriting callers that need to know comment
spans so they don't excise into or across a commented region.
Classification itself uses `tokenize`, which filters comments
out for speed.

raw docstring

cljdoc builds & hosts documentation for Clojure/Script libraries

Keyboard shortcuts

`Ctrl`+`k`	Jump to recent docs
`←`	Move to previous article
`→`	Move to next article
`Ctrl`+`/`	Jump to the search field

Raise an issue Browse cljdoc source Chat on Slack

× close