Structural shape matching for long-form SELECT statements.
A companion to datahike.pg.classify: classify routes by the first few tokens; shape answers 'what does the rest of this SELECT look like?'. Used by sql/system-query?* to identify pgjdbc + Odoo catalog probes that can't be recognized by a leading keyword — their identifying signal lives deep in the projection list, the FROM clause, or buried qualified identifiers.
This replaces the substring-match branches the classifier previously fell back to (str/includes? on lowercased SQL). Substring matching is safe when the needle is a distinctive SQL-only string like 'fk.conname as name', but it's a different model than the rest of the pipeline: a string literal containing that phrase would false-match. Tokenizing and matching on structural features (qualified references, AS aliases, function names) is immune to keyword-inside-string / keyword-inside- comment hostile inputs.
API: (catalog-probe sql) → :get-fk-conname | :get-primary-keys | :get-field-metadata | :empty-catalog | nil
Structural shape matching for long-form SELECT statements.
A companion to datahike.pg.classify: classify routes by the first
few tokens; shape answers 'what does the rest of this SELECT look
like?'. Used by sql/system-query?* to identify pgjdbc + Odoo
catalog probes that can't be recognized by a leading keyword —
their identifying signal lives deep in the projection list, the
FROM clause, or buried qualified identifiers.
This replaces the substring-match branches the classifier
previously fell back to (str/includes? on lowercased SQL).
Substring matching is safe when the needle is a distinctive
SQL-only string like 'fk.conname as name', but it's a different
model than the rest of the pipeline: a string literal containing
that phrase would false-match. Tokenizing and matching on
structural features (qualified references, AS aliases, function
names) is immune to keyword-inside-string / keyword-inside-
comment hostile inputs.
API:
(catalog-probe sql) → :get-fk-conname | :get-primary-keys
| :get-field-metadata | :empty-catalog | nil(catalog-probe sql)If sql is a SELECT that matches a known catalog-probe shape, return the :kind keyword the dispatch in server.clj expects; else nil.
Order of checks (most-specific first) matters: every named probe would also trigger :empty-catalog since they all reference catalog tables. We want the richer classification to win.
If sql is a SELECT that matches a known catalog-probe shape, return the :kind keyword the dispatch in server.clj expects; else nil. Order of checks (most-specific first) matters: every named probe would also trigger :empty-catalog since they all reference catalog tables. We want the richer classification to win.
(summarize toks)Walk a token stream once and extract the structural features our catalog probes key off. Comments are ignored — they carry no structural signal and already come through as :comment tokens.
Returns:
{:select? boolean — the stream starts with SELECT
:qrefs #{"fk.conname" "pg_catalog.pg_class" …}
— every dotted name referenced anywhere.
:idents #{"pg_constraint" "pg_class" …}
— every bare ident, lowercased.
:fn-names #{"format_type" …}
— every ident (bare or dotted) directly followed
by (.
:as-aliases #{["fk.conname" "name"] …}
— every <qname> AS <ident> pair.}
Walk a token stream once and extract the structural features our
catalog probes key off. Comments are ignored — they carry no
structural signal and already come through as :comment tokens.
Returns:
{:select? boolean — the stream starts with SELECT
:qrefs #{"fk.conname" "pg_catalog.pg_class" …}
— every dotted name referenced anywhere.
:idents #{"pg_constraint" "pg_class" …}
— every bare ident, lowercased.
:fn-names #{"format_type" …}
— every ident (bare or dotted) directly followed
by `(`.
:as-aliases #{["fk.conname" "name"] …}
— every `<qname> AS <ident>` pair.}cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |