Liking cljdoc? Tell your friends :D


release
dev

wreck - the "Whacky Regular Expression Construction Kit"

A micro-library for Clojure(Script) that provides a selection of regular expression construction functions. It has no dependencies, other than on Clojure, and emits standard Clojure regular expression objects, so is fully compatible with Clojure's built-in regular expression functions (it does not use any JVM-specific or JavaScript-specific regex syntax, though it can be used with platform-specific regular expression fragments to produce platform-specific regular expressions, if that's what you want).

The library is not intended to provide a comprehensive functional alternative for constructing regular expressions - knowledge of regular expression syntax and literals remains necessary. The library is instead intended to assist in constructing syntactically valid regular expressions by combining smaller regular expression fragments.

It also pairs very nicely with rencg - that library adds first class support for named capturing groups to Clojure (albeit the JVM flavour only).

Why?

I have other projects that perform complex text processing and in some cases have ended up writing very large regular expressions (as large as ~10KB), and maintaining huge regular expressions while keeping them syntactically and functionally correct using nothing but regular expression literals, is... ..."challenging". As a result I'd started using some helper functions so that I could modularise those regular expressions and test and construct them in pieces, and before long I realised that these functions were independently useful, despite not being complex or novel.

Installation

wreck is available as a Maven artifact from Clojars.

Usage

API documentation is available here, or here on cljdoc, and the unit tests are also worth perusing to see worked examples.

Trying it Out

Clojure CLI

$ clj -Sdeps '{:deps {com.github.pmonks/wreck {:mvn/version "RELEASE"}}}'

Leiningen

$ lein try com.github.pmonks/wreck

deps-try

$ deps-try com.github.pmonks/wreck

Demo

(require '[wreck.api :as re])

;; Basics

(re/esc ".*")
;=> "\\.\\*"  ; Note: a String - most other fns return regexes

(re/qot ".*")
;=> #"\Q.*\E"

(re/join #"a" #"b")
;=> #"ab"

(re/join "[" #"\p{Punct}" #"\p{Space}" "]+")  ; join also supports strings, allowing
                                              ; syntactically invalid fragments to be used to
                                              ; build up a valid expression
;=> #"[\p{Punct}\p{Space}]+"

(re/grp #"a" #"b")
;=> #"(?:ab)"  ; Default group is non-capturing

(re/cg #"a" #"b")
;=> #"(ab)"  ; But we can also do capturing groups

(re/ncg "ab" #"a" #"b")
;=> #"(?<ab>ab)"  ; And named capturing groups (much more useful, especially with rencg!)

; Because ClojureJVM doesn't implement equality for regexes, even though
; ClojureScript does...  🙄
(re/=' #"ab" (re/join #"a" #"b"))
;=> true


;; Cardinality

(re/zom #"foo")  ; zom = zero or more
;=> #"foo*"  ; Probably not what we want, so...

(re/zom-grp #"foo")
;=> #"(?:foo)*"  ; That's more like it!

(re/zom-grp #"foo" #"bar")  ; Can pass in as many regexes as you like to most -grp fns
;=> #"(?:foobar)*"

(re/oom-grp #"foo")  ; oom = one or more
;=> #"(?:foo)+"

(re/exn-grp 2 #"foo")  ; exn = exactly n
;=> #"(?:foo){2}"

(re/nom-grp 4 #"foo")  ; nom = n or more
;=> #"(?:foo){4,}"

(re/n2m-grp 12 17 #"foo")  ; n2m = n to m
;=> #"(?:foo){12,17}"

; There are -cg and -ncg versions of all of these fns as well


;; Alternation

(re/alt #"foo" #"bar")
;=> #"foo|bar"

(re/alt-grp #"foo" #"bar")  ; In case the alternates are themselves complex regexes that might
                            ; cause precedence order problems
;=> #"(?:foo)|(?:bar)"


;; Logical operators

(re/and' #"foo" #"bar")
;=> #"foobar|barfoo"

(re/and-grp #"foo" #"bar")
;=> #"(?:foobar)|(?:barfoo)"

(re/or' #"foo" #"bar")
;=> #"foobar|barfoo|foo|bar"

(re/or-grp #"foo" #"bar")
;=> #"(?:foobar)|(?:barfoo)|(?:foo)|(?:bar)"

(re/or-grp #"foo" #"bar" #"\s+")  ; Logical operators also support separators
;=> #"(?:foo\s+bar)|(?:bar\s+foo)|(?:foo)|(?:bar)"

(re/xor' #"foo" #"bar")  ; The same as alt, but provided for ease of comprehension in lengthy
                         ; regex composition expressions that use the logical operators
;=> #"foo|bar"

(re/xor-grp #"foo" #"bar")
;=> #"(?:foo)|(?:bar)"



;; A more complex example that composes a longer regex from just a few easy-to-read statements
;; (from the unit tests)

(def lorl-re (re/grp (re/or' #"Lesser" #"Library" #"\s+or\s+")))  ; "Lesser" or "Library", but
                                                                  ; in any order, or either
                                                                  ; word by itself, with the
                                                                  ; word "or" as a separator
;=> #"(?:Lesser\s+or\s+Library|Library\s+or\s+Lesser|Lesser|Library)"

(def lgpl-re (re/join #"(?iuU)(?<!\w)"                   ; Prefix fragment
                      (re/ncg "lgpl"                     ; Define a named capturing group
                        (re/alt-grp                      ; Outer 'alt' (with elements grouped)
                          (re/join #"GNU\s+" lorl-re)    ; GNU <lesser or library regex>
                          (re/join lorl-re #"\s+GPL")))  ; <lesser or library regex> GPL
                      #"(?!\w)"))                        ; Suffix fragment
;=> #"(?iuU)(?<!\w)(?<lgpl>(?:GNU\s+(?:Lesser\s+or\s+Library|Library\s+or\s+Lesser|Lesser|
;=> Library))|(?:(?:Lesser\s+or\s+Library|Library\s+or\s+Lesser|Lesser|Library)\s+GPL))(?!\w)"

; Which would you rather maintain?  😉

Contributor Information

Contributing Guidelines

Bug Tracker

Code of Conduct

Developer Workflow

This project uses the git-flow branching strategy, and the permanent branches are called release and dev. Any changes to the release branch are considered a release and auto-deployed (JARs to Clojars, API docs to GitHub Pages, etc.).

For this reason, all development must occur either in branch dev, or (preferably) in temporary branches off of dev. All PRs from forked repos must also be submitted against dev; the release branch is only updated from dev via PRs created by the core development team. All other changes submitted to release will be rejected.

Build Tasks

wreck uses tools.build. You can get a list of available tasks by running:

clojure -A:deps -T:build help/doc

Of particular interest are:

clojure -T:build test - run the unit tests
clojure -T:build lint - run the linters (clj-kondo and eastwood)
clojure -T:build ci - run the full CI suite (check for outdated dependencies, run the unit tests, run the linters)
clojure -T:build install - build the JAR and install it locally (e.g. so you can test it with downstream code)

Please note that the release and deploy tasks are restricted to the core development team (and will not function if you run them yourself).

License

Distributed under the Mozilla Public License, version 2.0.

SPDX-License-Identifier: MPL-2.0

Can you improve this documentation?Edit on GitHub

cljdoc is a website building & hosting documentation for Clojure/Script libraries

Keyboard shortcuts Report a problem cljdoc on GitHub

× close