The public API of wreck
.
Notes:
nil
, this library does minimal argument
checking, since the rules for regexes vary from platform to platform, and it
is a first class requirement that callers be allowed to construct platform
specific regexes if they wish.java.util.regex.PatternSyntaxException
class.js/SyntaxError
.#"{}"
(an error on the JVM, fine but
nonsensical on JS) and #"{1}"
(ironically, fine but nonsensical on the
JVM, but an error on JS). 🤡RegExp
objects to String
s and back, something this library relies
upon and does extensively. The library makes a best effort to correct
JavaScript's problematic implementation, but because it's fundamentally
lossy there are some cases that (on ClojureScript only) may change your
regexes in unexpected (though probably not semantically significant) ways.The public API of [`wreck`](https://github.com/pmonks/wreck). Notes: * Apart from passing through `nil`, this library does minimal argument checking, since the rules for regexes vary from platform to platform, and it is a first class requirement that callers be allowed to construct platform specific regexes if they wish. * As a result, all functions have the potential to throw platform-specific exceptions if the resulting regex is syntactically invalid. * On the JVM, these will typically be instances of the `java.util.regex.PatternSyntaxException` class. * On JavaScript, these will typically be a `js/SyntaxError`. * Platform specific behaviour is particularly notable for short / empty regexes, such as `#"{}"` (an error on the JVM, fine but nonsensical on JS) and `#"{1}"` (ironically, fine but nonsensical on the JVM, but an error on JS). 🤡 * Furthemore, JavaScript fundamentally doesn't support lossless round-tripping of `RegExp` objects to `String`s and back, something this library relies upon and does extensively. The library makes a best effort to correct JavaScript's problematic implementation, but because it's fundamentally lossy there are some cases that (on ClojureScript only) may change your regexes in unexpected (though _probably_ not semantically significant) ways. * Regex flags are supported to the best ability of the library, but please carefully review the [usage notes in README.md](https://github.com/pmonks/wreck?tab=readme-ov-file#regex-flags) for various caveats when flags are used.
(=' _)
(=' re1 re2)
(=' re1 re2 & more)
Equality for regexes, defined by having equal string representations and flags (including flags that cannot be embedded).
Notes:
#"..."
and #".{3}"
are not
considered ='
.='
initially due to differing flag sets, but after
being run through embed-flags
may become ='
, due to non-embeddable
flags being silently dropped (see embed-flags
for details).Equality for regexes, defined by having equal string representations and flags (including flags that cannot be embedded). Notes: * Functionally equivalent regexes (e.g. `#"..."` and `#".{3}"` are _not_ considered `='`. * Some regexes may not be `='` initially due to differing flag sets, but after being run through [[embed-flags]] may become `='`, due to non-embeddable flags being silently dropped (see [[embed-flags]] for details).
(alt & res)
Returns a regex that will match any one of res
, via alternation.
Notes:
res
will only appear once in the result.Returns a regex that will match any one of `res`, via alternation. Notes: * Duplicate elements in `res` will only appear once in the result. * Does _not_ wrap the result in a group, which, [because alternation has the lowest precedence in regexes](https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04_08), runs the risk of behaving unexpectedly if the result is then combined with further regexes. tl;dr - one of the grouping variants should _almost always_ be preferred.
(and' a b)
(and' a b s)
Returns an 'and' regex that will match a
and b
in any order, and with the
s
eparator regex (if provided) between them. This is implemented as
ASB|BSA
, which means that A and B must be distinct (must not match the same
text).
Notes:
alt
).Returns an 'and' regex that will match `a` and `b` in any order, and with the `s`eparator regex (if provided) between them. This is implemented as `ASB|BSA`, which means that A and B must be distinct (must not match the same text). Notes: * May optimise the expression (via de-duplication in [[alt]]). * Does _not_ wrap the result in a group, which, [because alternation has the lowest precedence in regexes](https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04_08), runs the risk of behaving unexpectedly if the result is then combined with further regexes. tl;dr - one of the grouping variants should _almost always_ be preferred.
(and-cg a b)
(and-cg a b s)
Notes:
-cg
fns, this one does not accept any number of res.alt
).[[and']] then [[cg]]. Notes: * Unlike most other `-cg` fns, this one does _not_ accept any number of res. * May optimise the expression (via de-duplication in [[alt]]).
(and-grp a b)
(and-grp a b s)
Notes:
-grp
fns, this one does not accept any number of res.alt
).[[and']] then [[grp]]. Notes: * Unlike most other `-grp` fns, this one does _not_ accept any number of res. * May optimise the expression (via de-duplication in [[alt]]).
(and-ncg nm a b)
(and-ncg nm a b s)
Notes:
-ncg
fns, this one does not accept any number of res.alt
).[[and']] then [[ncg]]. Notes: * Unlike most other `-ncg` fns, this one does _not_ accept any number of res. * May optimise the expression (via de-duplication in [[alt]]).
(cg & res)
As for grp
, but uses a capturing group.
As for [[grp]], but uses a capturing group.
(embed-flags re)
Embeds any flags found in re
at the start of re
in a non-capturing group
(to ensure scoping), returning a new regex. Returns re
if re
contains no
flags or is nil
.
For example, on the JVM #"(?i)[abc]+"
would become #"(?i:[abc]+)"
.
Similarly, on ClojureScript (doto (js/RegExp.) (.compile "[abc]+" "i"))
would also become #"(?i:[abc]+)"
.
Note:
flags-grp
is almost always a better choice than this function!
embed-flags
is primarily intended for internal use by wreck
, but may be
useful in those rare cases where Clojure(Script) code receives a 3rd party
regex, wishes to use it as part of composing a larger regex, doesn't
know if it contains flags or not, and doesn't care that non-embeddable flags
will be silently dropped.re
will be moved
to the beginning of the regex. This may alter the semantics of the regex -
for example a(?i)b
will become (?i:ab)
, which means that a
will be
matched case-insensitively by the result, which is not the same as the
original (which matches lower-case a
only). This is an unavoidable
consequence of how the JVM regex engine reports flags. If you really need
to use embedded flag(s) midway through a regex, use flags-grp
to ensure
proper scoping of the flag(s).LITERAL
and CANON_EQ
have no
embeddable equivalent, and will be silently dropped by this function. Use
has-non-embeddable-flags?
if you need to check for the presence of
these flags (e.g. in a 3rd party regex).ims
can be embedded. All other flags
will be silently dropped by this function. Use
has-non-embeddable-flags?
if you need to check for the presence of these
flags (e.g. in a 3rd party regex).Embeds any flags found in `re` at the start of `re` in a non-capturing group (to ensure scoping), returning a new regex. Returns `re` if `re` contains no flags or is `nil`. For example, on the JVM `#"(?i)[abc]+"` would become `#"(?i:[abc]+)"`. Similarly, on ClojureScript `(doto (js/RegExp.) (.compile "[abc]+" "i"))` would also become `#"(?i:[abc]+)"`. Note: * **[[flags-grp]] is almost always a better choice than this function!** `embed-flags` is primarily intended for internal use by `wreck`, but may be useful in those rare cases where Clojure(Script) code receives a 3rd party regex, wishes to use it as part of composing a larger regex, doesn't know if it contains flags or not, and doesn't care that non-embeddable flags will be silently dropped. * ⚠️ On the JVM, ungrouped embedded flags in the middle of `re` will be moved to the beginning of the regex. This may alter the semantics of the regex - for example `a(?i)b` will become `(?i:ab)`, which means that `a` will be matched case-insensitively by the result, which is _not_ the same as the original (which matches lower-case `a` only). This is an unavoidable consequence of how the JVM regex engine reports flags. If you really need to use embedded flag(s) midway through a regex, use [[flags-grp]] to ensure proper scoping of the flag(s). * ⚠️ On the JVM, the programmatic flags `LITERAL` and `CANON_EQ` have no embeddable equivalent, and will be silently dropped by this function. Use [[has-non-embeddable-flags?]] if you need to check for the presence of these flags (e.g. in a 3rd party regex). * ⚠️ On JavaScript, only the flags `ims` can be embedded. All other flags will be silently dropped by this function. Use [[has-non-embeddable-flags?]] if you need to check for the presence of these flags (e.g. in a 3rd party regex).
(empty?' re)
Is re
nil
or (=' #"")
?
Notes:
Is `re` `nil` or `(=' #"")`? Notes: * Takes flags (if any) into account.
(esc s)
Escapes s
(a String
) for use in a regex, returning a String
.
Notes:
Escapes `s` (a `String`) for use in a regex, returning a `String`. Notes: * unlike most other fns in this namespace, this one does _not_ support a regex as an input, nor return a regex as an output
(exn n re)
Returns a regex where re
will match exactly n
times.
Returns a regex where `re` will match exactly `n` times.
(flags-grp flgs & res)
As for grp
, but prefixes the group with flgs
(a String
). Returns
nil
if flgs
is nil
or empty. Throws if flgs
contains an invalid flag
character, including those that (ClojureScript only) cannot be embedded.
Notes:
(?i)
) have no explicit scope and so cannot be reliably used to compose
larger regexes. wreck
makes a best effort to always convert such
'unscoped' flags into their embedded equivalents when composing larger
regexes (via embed-flags
), but using flag groups explicitly in the
first place is easier to reason about and avoids potential footguns.re
(e.g. (?i)ab
), but unlike
embed-flags
does not check that they appear in flgs
.re
will also be
removed, which may alter the semantics of the regex.ims
can be embedded (this is a limitation
of the JavaScript regex engine). Other flags will result in a
js/SyntaxError
being thrown.java.util.regex.Pattern
JavaDoc
for the set of valid flag characters.RegExp
flags reference
for the set of valid flag characters.As for [[grp]], but prefixes the group with `flgs` (a `String`). Returns `nil` if `flgs` is `nil` or empty. Throws if `flgs` contains an invalid flag character, including those that (ClojureScript only) cannot be embedded. Notes: * If you must use regex flags, **it is STRONGLY RECOMMENDED that you use this function!** Programmatically set flags and ungrouped embedded flags (e.g. `(?i)`) have no explicit scope and so cannot be reliably used to compose larger regexes. `wreck` makes a best effort to always convert such 'unscoped' flags into their embedded equivalents when composing larger regexes (via [[embed-flags]]), but using flag groups explicitly in the first place is easier to reason about and avoids potential footguns. * Removes any ungrouped embedded flags in `re` (e.g. `(?i)ab`), but unlike [[embed-flags]] does _not_ check that they appear in `flgs`. * ⚠️ On the JVM, ungrouped embedded flags _in the middle of `re`_ will also be removed, which may alter the semantics of the regex. * ⚠️ On JavaScript, only the flags `ims` can be embedded (this is a limitation of the JavaScript regex engine). Other flags will result in a `js/SyntaxError` being thrown. * For the JVM, see the ['special constructs' section of the `java.util.regex.Pattern` JavaDoc](https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/util/regex/Pattern.html#special) for the set of valid flag characters. * For JavaScript, see the [`RegExp` flags reference](https://www.w3schools.com/js/js_regexp_flags.asp) for the set of valid flag characters.
(grp & res)
As for join
, but encloses the joined res
into a single non-capturing
group.
As for [[join]], but encloses the joined `res` into a single non-capturing group.
(has-non-embeddable-flags? re)
Does re
have non-embeddable flags?
Notes:
LITERAL
and CANON_EQ
.i
, m
, and s
.Does `re` have non-embeddable flags? Notes: * On the JVM, the only non-embeddable flags are the programmatic flags `LITERAL` and `CANON_EQ`. * On JavaScript, this is every flag _except_ `i`, `m`, and `s`.
(join & res)
Returns a regex that is all of the res
joined together. Each element in
res
can be a regex, a String
or something that can be turned into a
String
(including numbers, etc.). Returns nil
when no res
are provided,
or they're all nil
.
Notes:
Returns a regex that is all of the `res` joined together. Each element in `res` can be a regex, a `String` or something that can be turned into a `String` (including numbers, etc.). Returns `nil` when no `res` are provided, or they're all `nil`. Notes: * ⚠️ In ClojureScript be cautious about using numbers in these calls, since JavaScript's number handling is a 🤡show. See the unit tests for examples.
(n2m n m re)
Returns a regex where re
will match from n
to m
times.
Returns a regex where `re` will match from `n` to `m` times.
(ncg nm & res)
As for grp
, but uses a named capturing group named nm
. Returns nil
if
nm
is nil
or blank. Throws if nm
is an invalid name for a named capturing
group (alphanumeric only, must start with an alphabetical character, must be
unique within the regex).
As for [[grp]], but uses a named capturing group named `nm`. Returns `nil` if `nm` is `nil` or blank. Throws if `nm` is an invalid name for a named capturing group (alphanumeric only, must start with an alphabetical character, must be unique within the regex).
(nom n re)
Returns a regex where re
will match n
or more times.
Returns a regex where `re` will match `n` or more times.
(oom re)
Returns a regex where re
will match one or more times.
Returns a regex where `re` will match one or more times.
(opt re)
Returns a regex where re
is optional.
Returns a regex where `re` is optional.
(or' a b)
(or' a b s)
Returns an 'inclusive or' regex that will match a
or b
, or both, in any
order, and with the s
eparator regex (if provided) between them. This is
implemented as ASB|BSA|A|B
, which means that A and B must be distinct (must
not match the same text).
Notes:
alt
).Returns an 'inclusive or' regex that will match `a` or `b`, or both, in any order, and with the `s`eparator regex (if provided) between them. This is implemented as `ASB|BSA|A|B`, which means that A and B must be distinct (must not match the same text). Notes: * May optimise the expression (via de-duplication in [[alt]]). * Does _not_ wrap the result in a group, which, [because alternation has the lowest precedence in regexes](https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04_08), runs the risk of behaving unexpectedly if the result is then combined with further regexes. tl;dr - one of the grouping variants should _almost always_ be preferred.
(or-cg a b)
(or-cg a b s)
Notes:
-cg
fns, this one does not accept any number of res.alt
).[[or']] then [[cg]]. Notes: * Unlike most other `-cg` fns, this one does _not_ accept any number of res. * May optimise the expression (via de-duplication in [[alt]]).
(or-grp a b)
(or-grp a b s)
Notes:
-grp
fns, this one does not accept any number of res.alt
).[[or']] then [[grp]]. Notes: * Unlike most other `-grp` fns, this one does _not_ accept any number of res. * May optimise the expression (via de-duplication in [[alt]]).
(or-ncg nm a b)
(or-ncg nm a b s)
Notes:
-ncg
fns, this one does not accept any number of res.alt
).[[or']] then [[ncg]]. Notes: * Unlike most other `-ncg` fns, this one does _not_ accept any number of res. * May optimise the expression (via de-duplication in [[alt]]).
(qot re)
Quotes re
(anything that can be accepted by join
), returning a regex.
Quotes `re` (anything that can be accepted by [[join]]), returning a regex.
(str' o)
Returns the String
representation of o
, with special handling for
RegExp
objects on ClojureScript in an attempt to correct JavaScript's
APPALLING default stringification.
Notes:
embed-flags
).Returns the `String` representation of `o`, with special handling for `RegExp` objects on ClojureScript in an attempt to correct JavaScript's **APPALLING** default stringification. Notes: * Embeds flags (as per [[embed-flags]]).
(xor' a b)
Returns an 'exclusive or' regex that will match a
or b
, but not both.
This is identical to alt
called with 2 arguments, and is provided as a
convenience for those who might be building up large logic based regexes and
would prefer to use more easily understood logical operator names throughout.
Notes:
alt
).Returns an 'exclusive or' regex that will match `a` or `b`, but _not_ both. This is identical to [[alt]] called with 2 arguments, and is provided as a convenience for those who might be building up large logic based regexes and would prefer to use more easily understood logical operator names throughout. Notes: * May optimise the expression (via de-duplication in [[alt]]). * Does _not_ wrap the result in a group, which, [because alternation has the lowest precedence in regexes](https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04_08), runs the risk of behaving unexpectedly if the result is then combined with further regexes. tl;dr - one of the grouping variants should _almost always_ be preferred.
(xor-cg a b)
Notes:
-cg
fns, this one does not accept any number of res.alt
).[[xor']] then [[cg]]. Notes: * Unlike most other `-cg` fns, this one does _not_ accept any number of res. * May optimise the expression (via de-duplication in [[alt]]).
(xor-grp a b)
Notes:
-grp
fns, this one does not accept any number of res.alt
).[[xor']] then [[grp]]. Notes: * Unlike most other `-grp` fns, this one does _not_ accept any number of res. * May optimise the expression (via de-duplication in [[alt]]).
(xor-ncg nm a b)
Notes:
-ncg
fns, this one does not accept any number of res.alt
).[[xor']] then [[ncg]]. Notes: * Unlike most other `-ncg` fns, this one does _not_ accept any number of res. * May optimise the expression (via de-duplication in [[alt]]).
(zom re)
Returns a regex where re
will match zero or more times.
Returns a regex where `re` will match zero or more times.
cljdoc builds & hosts documentation for Clojure/Script libraries
Ctrl+k | Jump to recent docs |
← | Move to previous article |
→ | Move to next article |
Ctrl+/ | Jump to the search field |