Liking cljdoc? Tell your friends :D


Setup
API
Introduction
Mottos
Mechanics
Validating Scalars
Validating Collections
Validation Summaries and Thorough Validations
Validating Functions
Generating Random Samples and Exercising
Utilities
Predicates
Non-terminating Sequences
Sets
Comparison to spec.alpha
Perhaps So
Recipes
Troubleshooting
Case Study
Alternatives
Glossary
Contact

Speculoos

An experiment with Clojure specification literals

Setup

Leiningen/Boot

[com.sagevisuals/speculoos "5"]

Clojure CLI/deps.edn

com.sagevisuals/speculoos {:mvn/version "5"}

Require

(require '[speculoos.core :refer [valid-scalars? valid-collections?]])

Introduction

Imagine we'd like to know if our Clojure vector contains an integer, then a string, and finally a ratio. One example of that data vector might look like this.

[42 "abc" 22/7]

It would be nice if we could write a specification that is shaped like that data.

[int? string? ratio?]

Speculoos can validate our data vector with that specification vector.

(valid-scalars? [42 "abc" 22/7]
                [int? string? ratio?])
;; => true

Notice now the specification's predicate functions in the the lower row line up with the data's values in the upper row. Integer 42 pairs with predicate int?, string "abc" pairs with predicate string?, and ratio 22/7 pairs with predicate ratio?. All three scalar values satisfy their respective predicates, so the validation returns true.

Now imagine we'd like ensure our Clojure hash-map contains an integer at key :x and a ratio at key :y. Something like this.

{:x 42 :y 22/7}

We could write a specification map that's shaped like that data map.

{:x int? :y ratio?}

Speculoos can validate our data map with that specification map.

(valid-scalars? {:x 42, :y 22/7}
                {:x int?, :y ratio?})
;; => true

Again, the specification's predicate functions in the lower row correspond to the data's values in the upper row. Integer 42 at key :x pairs with predicate int? also at key :x, while ratio 22/7 at key :y pairs with predicate ratio? also at key :y. Both scalar values satisfy their respective predicates, so the validation returns true.

Notice how in both cases, the specifications mimic the shape of the data. The vector's specification is itself a vector. The map's specification is itself a map.

Speculoos can validate any heterogeneous, arbitrarily-nested data collection using specifications composed of plain Clojure collections and functions. In short, Speculoos is an experimental library that aims to perform the same tasks as clojure.spec.alpha with an intuitive interface that employs flexible and powerful specification literals.

★ Three Mottos

When using Speculoos, remember these three Mottos:

  1. Validate scalars separately from validating collections.
  2. Shape the specification to mimic the data.
  3. Ignore un-paired predicates and un-paired datums.

Speculoos provides functions for validating scalars (integers, strings, booleans, etc.) contained within a heterogeneous, arbitrarily-nested data structure, and another, distinct group of functions for validating properties of those nested collections (vectors, maps, sets, etc.). Validating scalars separately from validating collections carries several advantages. First, it's simpler. Libraries that validate scalars and collections with one specification tend to require a mini-language that mixes identities and quantities (e.g., regular expression-like syntax). Modifying, combining, and subsetting those specifications might be non-intuitive. In contrast, by validating scalars separately from collections, Speculoos can consume much simpler specifications composed of regular Clojure data structures containing regular Clojure predicate functions. We can inspect and manipulate those specifications with any familiar Clojure collection-handling function, such as assoc-in. No macros necessary. Second, specifying scalars separately from specifying collections offers mental clarity about what's going on. Our predicates will only ever apply to a scalar, or to a collection, never both. And our scalar predicate doesn't have to know anything about the quantity or location of the element. Third, we only need to specify as much, or as little, as necessary. If we only want to validate a few scalars, we aren't forced to specify a property concerning a collection.

Speculoos aims to make composing specifications straightforward. To that end, specifications mimic the shape of the data they describe. A Speculoos specification is merely an arrangement of nested vectors, lists, maps, sequences, and sets that contain predicate functions. Those predicates are arranged in a pattern that instruct the validation functions where to apply the predicates. The specification for a vector is a vector. Predicates are applied to the scalars in-order. The specification for a map, is itself a map. Predicates are applied to the scalars at the same key. There's a nearly one-to-one correspondence between the shape of the data and the shape of the specification. In fact, a solid strategy for creating a specification is to copy-paste the data, delete the contents, and then, using that as a template, replace the elements with predicates. Such a specification is straightforward to peek at by eye — merely evaluate them and they'll display themselves at our repl — but they're also amenable to alteration. We can use our favorite Clojure data wrangling functions to tighten, relax, or remove portions of a Speculoos specification. assoc-in, update-in, and dissoc are our friends.

Speculoos provides flexibility, power, optionality, and re-usability of specifications by ignoring datums that do not have a corresponding predicate in the specification and ignoring predicates that do not have a corresponding datum in the data. Maybe at our job in an assembly line, we only care about some slice of a large chunk of data. Supplying predicates for only a subset of datums allows us to only validate those specified datums while being agnostic towards the other datums. Going in the other direction, maybe somebody shared a giant, sprawling specification that describes a myriad of data about a person, their postal address, their contact info, etc. Because a Speculoos specification is just a data structure with regular predicates, we can, on-the-fly, get-in the portion relevant to postal addresses and apply that to our particular instances of address data. Speculoos lets us specify exactly what elements we'd like to validate. No more, no less.

Mechanics

Knowing a little bit about how Speculoos does its job will greatly help us understand how to use it. First, we need to know on how to address elements contained within a heterogeneous, arbitrarily-nested data structure. Speculoos follows the conventions set by clojure.core/get-in, and extends those conventions where necessary.

Vectors are addressed by zero-indexed integers.

           [100 101 102 103]
indexes --> 0 1 2 3

Same for lists…

          '(97 98 99 100)
indexes --> 0 1 2 3

…and same for sequences, like range.

(range 29 33) ;; => (29 30 31 32)
indexes -----------> 0 1 2 3

Maps are addressed by their keys, which are often keywords, like this.

        {:a 1 :foo "bar" :hello 'world}
keys --> :a :foo :hello

But maps may be keyed by any value, including integers…

        {0 "zero" 1 "one" 99 "ninety-nine"}
keys --> 0 1 99

…or some other scalars…

        {"a" :value-at-str-key-a 'b :value-at-sym-key-b \c :value-at-char-key-c}
keys --> "a" 'b \c

…even composite values.

        {[0] :val-at-vec-0 [1 2 3] :val-at-vec-1-2-3 {} :val-at-empty-map}
keys --> [0] [1 2 3] {}

Set elements are addressed by their identities, so they are located at themselves.

             #{42 :foo true 22/7}
identities --> 42 :foo true 22/7

A path is a sequence of indexes, keys, or identities that allow us refer to a single element buried within a nested data structure. For each level of nesting, we add an element to the path sequence. clojure.core/get-in illustrates how this works.

(get-in [100 101 102 103] [2]) ;; => 102

For a vector containing only integers, each element is addressed by a path of length one. To locate integer 102 in the vector above, the path is [2].

If we consider a vector nested within a vector…

(get-in [100 101 [102 103]] [2]) ;; => [102 103]

…that same path [2] now locates the nested vector. To navigate to an integer contained within the nested vector…

(get-in [100 101 [102 103]] [2 0]) ;; => 102

…requires a path of length two: [2 0] where the 2 addresses the nested vector [102 103] and the 0 addresses the 102 within the nested vector. If we have an integer contained within a vector, contained within a vector, contained within a vector, we'd use a path of length three to get that integer.

(get-in [100 [101 [102]]] [1]) ;; => [101 [102]]
(get-in [100 [101 [102]]] [1 1]) ;; => [102]
(get-in [100 [101 [102]]] [1 1 0]) ;; => 102

The 102 is buried three levels deep, so we use a path with three entries.

This system works similarly for maps. Elements contained in un-nested collections are located with a path of length one.

(get-in {:x 100, :y 101, :z 102} [:z]) ;; => 102

In this example, 102 is located with a path composed of a single key, keyword :z. If we now consider a map nested within another map…

(get-in {:x 100, :y 101, :z {:w 102}} [:z :w]) ;; => 102

…we need a path with two elements. Key :z navigates us to the nested {:w 102} map, and then key :w navigates us to the 102 within that nested map.

There's no restriction on what may be nested in what, so we can nest a map within a vector…

(get-in [100 101 {:x 102}] [2 :x]) ;; => 102

…or nest a vector within a map…

(get-in {:x 100, :y {:z [101 102]}} [:y :z 1]) ;; => 102

…or, if we use a modified version of clojure.core/get-in, nest a vector within a map within a list.

(require '[fn-in.core :refer [get-in*]])

(get-in* '(100 101 {:x [102]}) [2 :x 0]) ;; => 102

102 is contained in three levels of nesting, so its path is comprised of three pieces.

Speculoos provides a little machine to enumerate paths for us. When supplied with a heterogeneous, arbitrarily-nested data structure, speculoos.core/all-paths returns a sequence of {:path … :value …} for every element, both scalars and collections.

(require '[speculoos.core :refer [all-paths]])

(all-paths [100 101 102]) ;; => [{:path [], :value [100 101 102]} ;; {:path [0], :value 100} ;; {:path [1], :value 101} ;; {:path [2], :value 102}]

Note: We receive paths for four items, three integers, plus a path to the outer container itself. The root collection always has a path []. The integer elements each have a path of a single, zero-indexed integer that locates them within the parent vector. Here's how it works with a map.

(all-paths {:x 100, :y 101, :z 102})
;; => [{:path [], :value {:x 100, :y 101, :z 102}}
;;     {:path [:x], :value 100}
;;     {:path [:y], :value 101}
;;     {:path [:z], :value 102}]

Each of the three integers has a path with a key that locates them within the parent map, and the parent map has a path of [] because it's the root collection.

If we supply a nested data structure, the paths reflect that nesting.

(all-paths [100 101 [102 103]])
;; => [{:path [], :value [100 101 [102 103]]}
;;     {:path [0], :value 100}
;;     {:path [1], :value 101}
;;     {:path [2], :value [102 103]}
;;     {:path [2 0], :value 102}
;;     {:path [2 1], :value 103}]

Now, we have six elements to consider: each of the four integers have a path, and both of the collections have a path. The outer parent vector has path [] because it's the root, and the nested collection is located at path [2], the third element of the root vector. Let's look at all the paths of nested maps.

(all-paths {:x 100, :y 101, :z {:w 102}})
;; => [{:path [], :value {:x 100, :y 101, :z {:w 102}}}
;;     {:path [:x], :value 100}
;;     {:path [:y], :value 101}
;;     {:path [:z], :value {:w 102}}
;;     {:path [:z :w], :value 102}]

Again, each of the three integers has a path, and both of the maps have a path, for a total of five paths.

There is nothing special about integers. all-paths will treat any element, scalar or collection, the same way. Every element has a path. We could replace those integers with functions, un-nested in a vector…

(all-paths [int? string? ratio?])
;; => [{:path [], :value [int? string? ratio?]}
;;     {:path [0], :value int?}
;;     {:path [1], :value string?}
;;     {:path [2], :value ratio?}]

…or nested in a map…

(all-paths {:x int?, :y string?, :z {:w ratio?}})
;; => [{:path [],
;;      :value {:x int?,
;;              :y string?,
;;              :z {:w ratio?}}}
;;     {:path [:x], :value int?}
;;     {:path [:y], :value string?}
;;     {:path [:z], :value {:w ratio?}}
;;     {:path [:z :w], :value ratio?}]

The important principle to remember is this: Every element, scalar and collection, of a heterogeneous, arbitrarily-nested data structure, can be assigned an unambiguous path, regardless of its container type.

If we ever find ourselves with a nested list on our hands, all-paths has got us covered.

(all-paths [42 (list 'foo 'bar 'baz)])
;; => [{:path [], :value [42 (foo bar baz)]}
;;     {:path [0], :value 42}
;;     {:path [1], :value (foo bar baz)}
;;     {:path [1 0], :value foo}
;;     {:path [1 1], :value bar}
;;     {:path [1 2], :value baz}]

Likewise, sets are indispensable in some situations, so all-paths can handle it.

(all-paths {:a 42, :b #{:chocolate :vanilla :strawberry}})
;; => [{:path [], :value {:a 42, :b #{:chocolate :strawberry :vanilla}}}
;;     {:path [:a], :value 42}
;;     {:path [:b], :value #{:chocolate :strawberry :vanilla}}
;;     {:path [:b :strawberry], :value :strawberry}
;;     {:path [:b :chocolate], :value :chocolate}
;;     {:path [:b :vanilla], :value :vanilla}]

Admittedly, addressing elements in a set can be a little like herding cats, but it's still useful to have the capability. Wrangling sets merits its own dedicated section.

So what does all this paths business have to do with validation? Speculoos inspects the path of a predicate within a specification in an attempt to pair it with an element in the data. If it can pair a predicate with a datum, it applies the predicate to that datum.

Scalar Validation

Let's return to the English-language specification we saw in the introduction: A vector containing an integer, then a string, then a ratio. Consider the paths of this vector…

(all-paths [42 "abc" 22/7])
;; => [{:path [], :value [42 "abc" 22/7]}
;;     {:path [0], :value 42}
;;     {:path [1], :value "abc"}
;;     {:path [2], :value 22/7}]

…and the paths of this vector…

(all-paths [int? string? ratio?])
;; => [{:path [], :value [int? string? ratio?]}
;;     {:path [0], :value int?}
;;     {:path [1], :value string?}
;;     {:path [2], :value ratio?}]

We see that elements of both share paths. If we keep only the paths to scalars, i.e., we discard the root collections at path [], each has three elements remaining.

  • 42 and int? both at path [0], in their respective vectors,
  • "abc" and string? both at path [1], and
  • 22/7 and ratio? both at path [2].

Those pairs of scalars and predicates line up nicely, and we could evaluate each pair, in turn.

(int? 42) ;; => true
(string? "abc") ;; => true
(ratio? 22/7) ;; => true

All three scalars satisfy their respective predicates that they're paired with. Speculoos provides a function, validate-scalars that substantially does all that work for us. Given data and a specification that share the data's shape (Motto #2), validate-scalars:

  1. Runs all-paths on the data, then the specification.
  2. Removes the collection elements from each, keeping only the scalars in each.
  3. Removes the scalars in data that lack a predicate at the same path in the specification, and removes the predicates in the specification that lack datums at the same path in the data.
  4. For each remaining pair of scalar+predicate, applies the predicate to the scalar.

Let's see that in action. We invoke validate-scalars with the data vector as the first argument and the specification vector as the second argument.

(require '[speculoos.core :refer [validate-scalars]])

(validate-scalars [42 "abc" 22/7]   [int? string? ratio?]) ;; => [{:datum 42, ;; :path [0], ;; :predicate int?, ;; :valid? true} ;; {:datum "abc", ;; :path [1], ;; :predicate string?, ;; :valid? true} ;; {:datum 22/7, ;; :path [2], ;; :predicate ratio?, ;; :valid? true}]

Let's apply the Mottos to what we just did.

  • Motto #1: The -scalars suffix reminds us that validate-scalars ignores collections. If we examine the report (we'll look in detail in the next paragraph), the validation yielded only predicates applied to scalars.
  • Motto #2: The shape of our specification mimics the data. Because both are vectors whose contents are addressed by integer indexes, validate-scalars was able to make three pairs. Each of the three scalars in the data vector shares a path with a corresponding predicate in the specification vector.
  • Motto #3: Every predicate was paired with a datum and vice versa, so validation did not ignore anything.

validate-scalars returns a sequence of all the scalars in data that share a path with a predicate in the specification. For each of those pairs, we receive a map containing the :datum scalar element of the data, the :predicate test function element of the specification, the :path addressing each in their respective structures, and the valid? result of applying the predicate function to the datum. From top to bottom:

  • Scalar 42 at path [0] in the data vector satisfied predicate int? at path [0] in the specification vector,
  • scalar "abc" at path [1] in the data vector satisfied predicate string? at path [1] in the specification vector, and
  • scalar 22/7 at path [2] in the data vector satisfied predicate ratio? at path [2] in the specification vector.

What if there's a length mis-match between the data and the specification? Motto #3 tells us that validation ignores un-paired datums. Let's look at the all-paths for that situation.

;; data vector containing an integer, a symbol, and a character
(all-paths [42 "abc" 22/7]) ;; => [{:path [], :value [42 "abc" 22/7]} ;; {:path [0], :value 42} ;; {:path [1], :value "abc"} ;; {:path [2], :value 22/7}]

;; specification vector containing one predicate
(all-paths [int?]) ;; => [{:path [], :value [int?]}  {:path [0], :value int?}]

After discarding the root collections at path [] we find the only scalar+predicate pair at path [0], and that's the only pair that validate-scalars looks at.

(validate-scalars [42 "abc" 22/7]
                  [int?])
;; => [{:datum 42,
;;      :path [0],
;;      :predicate int?,
;;      :valid? true}]

Only scalar 42 in the data vector has a corresponding predicate int? in the specification vector, so the validation report contains only one entry. The second and third scalars, "abc" and 22/7, are ignored.

What about the other way around? More predicates in the specification than scalars in the data?

;; data vector containing one scalar, an integer
(all-paths [42]) ;; => [{:path [], :value [42]} {:path [0], :value 42}]

;; specification vector containing three predicates
(all-paths [int? string? ratio?]) ;; => [{:path [], :value [int? string? ratio?]} ;; {:path [0], :value int?} ;; {:path [1], :value string?} ;; {:path [2], :value ratio?}]

Motto #3 reminds us that validation ignores un-paired predicates. Only the predicate int? at path [0] in the specification vector shares its path with a scalar in the data vector, so that's the only scalar+predicate pair that validate-scalars processes.

(validate-scalars [42]
                  [int? string? ratio?])
;; => [{:datum 42,
;;      :path [0],
;;      :predicate int?,
;;      :valid? true}]

validate-scalars ignores both string? and ratio? within the specification vector because the data vector does not contain scalars at their respective paths.

This principle of ignoring un-paired scalars and un-paired predicates provides some useful features. If we only care about validating the first datum, we could insert only one predicate into the specification and rely on the fact that the remaining un-paired datums are ignores. That pattern offers permissiveness. On the other hand, we could compose a lengthy specification and validate a steadily accreting vector with that single specification. That pattern promotes re-use. See Perhaps So for more discussion.

Validating scalars contained within a map proceeds similarly. Let's send this map, our data, to all-paths.

(all-paths {:x 42, :y "abc", :z 22/7})
;; => [{:path [], :value {:x 42, :y "abc", :z 22/7}}
;;     {:path [:x], :value 42}
;;     {:path [:y], :value "abc"}
;;     {:path [:z], :value 22/7}]

Four elements: the root collection (a map), and three scalars. Then we'll do the same for this map, our specification, which mimics the shape of the data (Motto #2), by also being a map with the same keys.

(all-paths {:x int?, :y string?, :z ratio?})
;; => [{:path [], :value {:x int?, :y string?, :z ratio?}}
;;     {:path [:x], :value int?}
;;     {:path [:y], :value string?}
;;     {:path [:z], :value ratio?}]

Again four elements: the root collection (a map), and three predicates. Note that each predicate shares a path with one of the scalars in the data map. Invoking validate-scalars with the data map followed by the specification map…

(validate-scalars {:x 42, :y "abc", :z 22/7}
                  {:x int?, :y string?, :z ratio?})
;; => [{:datum 42,
;;      :path [:x],
;;      :predicate int?,
;;      :valid? true}
;;     {:datum "abc",
;;      :path [:y],
;;      :predicate string?,
;;      :valid? true}
;;     {:datum 22/7,
;;      :path [:z],
;;      :predicate ratio?,
;;      :valid? true}]

…we can see that

  • Scalar 42 at path [:x] in the data map satisfies predicate int? at path [:x] in the specification map,
  • scalar "abc" at path [:y] in the data map satisfies predicate string? at path [:y] in the specification map, and
  • scalar 22/7 at path [:z] in the data map satisfies predicate ratio? at path [:z] in the specification map.

Let's apply the Three Mottos.

  • Motto #1: validate-scalars ignores every element that is not a scalar, and we wrote our predicates to test only scalars.
  • Motto #2: We shaped our specification to mimic the data: we composed our specification to be a map with keys :x, :y, and :z, same as the data map. Because of this mimicry, validate-scalars is able to infer how to apply each predicate to the intended datum.
  • Motto #3: Because each predicate in the specification shared a path with each scalar in the data, and because each scalar in the data shared a path with a predicate in the specification, nothing scalars or predicates were ignored.

validate-scalars can only operate with complete scalar+predicate pairs. It ignores un-paired scalars and it ignores un-paired predicates. Since maps are not sequential, we can illustrate both scenarios with one example.

;; data with keys :x and :q
(all-paths {:x 42, :q "foo"}) ;; => [{:path [], :value {:q "foo", :x 42}} ;; {:path [:x], :value 42} ;; {:path [:q], :value "foo"}]

;; specification with keys :x and :s
(all-paths {:x int?, :s decimal?}) ;; => [{:path [], :value {:s decimal?, :x int?}} ;; {:path [:x], :value int?} ;; {:path [:s], :value decimal?}]

Notice that the two maps contain only a single scalar/predicate that share a path, [:x]. The other two elements, scalar "foo" at path [:q] in the data map and predicate decimal? at path [:s] in the specification map, do not share a path with an element of the other. "foo" and decimal? will be ignored.

(validate-scalars {:x 42, :q "foo"}
                  {:x int?, :s decimal?})
;; => [{:datum 42,
;;      :path [:x],
;;      :predicate int?,
;;      :valid? true}]

validate-scalars found only a single complete scalar+predicate pair located at path [:x], so it applied int? to 42, which returns satisfied.

Again, the principle of ignoring un-paired scalars and ignoring un-paired predicates provides quite a bit of utility. If we are handed a large data map, but we are only interested in the scalar at :x, we are free to validate only that value putting only one predicate at :x in the specification map. Validation ignores all the other stuff we don't care about. Similarly, perhaps we've built a comprehensive specification map that contains keys :a through :z, but for one particular scenario, our data only contains a value at key :y. We can directly use that comprehensive specification un-modified, and validate-scalars will consider only the one paired datum+predicate and ignore the rest.

Scalars contained in nested collections are treated accordingly: predicates from the specification are only applied to scalars in the data which share their path. The paths are merely longer than one element. Non-scalars are ignored. Here are the paths for a simple nested data vector containing scalars.

(all-paths [42 ["abc" [22/7]]])
;; => [{:path [], :value [42 ["abc" [22/7]]]}
;;     {:path [0], :value 42}
;;     {:path [1], :value ["abc" [22/7]]}
;;     {:path [1 0], :value "abc"}
;;     {:path [1 1], :value [22/7]}
;;     {:path [1 1 0], :value 22/7}]

Six total elements: three vectors, which validate-scalars will ignore, and three scalars.

And here are the paths for a similarly-shaped nested specification.

;;                         v --- char? predicate will be notable during validation in a moment
(all-paths [int? [string? [char?]]]) ;; => [{:path [], :value [int? [string? [char?]]]} ;; {:path [0], :value int?} ;; {:path [1], :value [string? [char?]]} ;; {:path [1 0], :value string?} ;; {:path [1 1], :value [char?]} ;; {:path [1 1 0], :value char?}]

Again, six total elements: three vectors that will be ignored, plus three predicates. When we validate…

(validate-scalars [42 ["abc" [22/7]]]
                  [int? [string? [char?]]])
;; => [{:datum 42,
;;      :path [0],
;;      :predicate int?,
;;      :valid? true}
;;     {:datum "abc",
;;      :path [1 0],
;;      :predicate string?,
;;      :valid? true}
;;     {:datum 22/7,
;;      :path [1 1 0],
;;      :predicate char?,
;;      :valid? false}]

Three complete pairs of scalars and predicates.

  • Scalar 42 at path [0] in the data satisfies predicate int? at path [0] in the specification,
  • scalar "abc" at path [1 0] in the data satisfies predicate string? at path [1 0] in the specification,
  • scalar 22/7 at path [1 1 0] in the data does not satisfy predicate char? at path [1 1 0] in the specification.
Later, we'll see that the lone, unsatisfied char? predicate would cause an entire valid? operation to return false.

When the data contains scalars that are not paired with predicates in the specification, they are not validated.

(validate-scalars [42 ["abc" [22/7]]]
                  [int? [string?]])
;; => [{:datum 42,
;;      :path [0],
;;      :predicate int?,
;;      :valid? true}
;;     {:datum "abc",
;;      :path [1 0],
;;      :predicate string?,
;;      :valid? true}]

Only the 42 and "abc" are paired with predicates, so validate-scalars only validated those two scalars. 22/7 is unpaired, and therefore ignored. Likewise…

(validate-scalars [42]
                  [int? [string? [char?]]])
;; => [{:datum 42,
;;      :path [0],
;;      :predicate int?,
;;      :valid? true}]

string? and char? are not paired, and therefore ignored. When the data contains only one scalar, but the specification contains more predicates, validate-scalars only validates the complete scalar+predicate pairs.

Mis-matched, nested maps sing the same song. Here are the paths for all elements in a nested data map and a nested specification map.

;; data
(all-paths {:x 42, :y {:z 22/7}}) ;; => [{:path [], :value {:x 42, :y {:z 22/7}}} ;; {:path [:x], :value 42} ;; {:path [:y], :value {:z 22/7}} ;; {:path [:y :z], :value 22/7}]

;; specification
(all-paths {:x int?, :y {:q string?}}) ;; => [{:path [], :value {:x int?, :y {:q string?}}} ;; {:path [:x], :value int?} ;; {:path [:y], :value {:q string?}} ;; {:path [:y :q], :value string?}]

Notice that only the scalar 42 in the data and the predicate int? in the specification share a path [:x]. Scalar 22/7 at path [:y :z] in the data and predicate string? at path [:y :q] in the specification are un-paired because they do not share paths.

(validate-scalars {:x 42, :y {:z 22/7}}
                  {:x int?, :y {:q string?}})
;; => [{:datum 42,
;;      :path [:x],
;;      :predicate int?,
;;      :valid? true}]

validate-scalars dutifully evaluates the single scalar+predicate pair, and tells us that 42 is indeed an integer.

One final illustration: what happens if there are zero scalar+predicate pairs.

(validate-scalars {:x 42} {:y int?}) ;; => []

The only scalar, at the path [:x] in the data, does not share a path with the only predicate, at path [:y] in the specification. No validations were performed.

A Speculoos scalar specification says This data element may or may not exist, but if it does, it must satisfy this predicate. See this later section for functions that return high-level true/false validation summaries and for functions that ensure validation of every scalar element.

Collection Validation

You may have been uncomfortably shifting in your chair while reading through the examples above. Every example we've seen so far shows Speculoos validating individual scalars, such as integers, strings, booleans, etc.

(valid-scalars? [99 "qwz" -88]
                [int? string? neg-int?])
;; => true

However, we might need to specify some property of a collection itself, such as a vector's length, the presence of a key in a map, relationships between datums, etc. That is collection validation.

One way to visualize the difference is this. Scalar validation targets…

 v----v-------v------v---- scalar validation targets these things
[42 \z {:x 'foo :y 9.87} ]

…integers, characters, symbols, etc.

In contrast, collection validation targets…

v--------v---------------v-v---- collection validation targets these things
[42 \z {:x 'foo :y 9.87} ]

…vectors, maps, sequences, lists, and sets.

One of Speculoos' main concepts is that scalars are specified and validated explicitly separately from collections. You perhaps noticed that the function name we have been using wasn't validate, but instead validate-scalars. Speculoos provides a distinct group of functions to validate the properties of collections, independent of the scalar values contained within the collection. The collection validation functions are distinguished by a -collections suffix.

Let's examine why and how they're distinct.

When to validate collections versus validating scalars

So when do we use collection validation instead of scalar validation? Basically, any time we want to verify a property that's beyond a single scalar.

Validate a property of the collection itself. In this section, we'll often validate the type of the collection.

(vector? []) ;; => true
(vector? {}) ;; => false

Those collection type predicates are short, mnemonic, and built-in, but knowing the mere type of a collection perhaps isn't broadly useful. But maybe we'd like to know how many items a vector contains…

(>= 3 (count [1 2 3])) ;; => true

…or if it contains an even number of elements…

(even? (count [1 2 3])) ;; => false

…or if a map contains a particular key…

(contains? {:x 42} :y) ;; => false

…or maybe if a set contains anything at all.

(empty? #{}) ;; => true

None of those tests are available without access to the whole collection. A scalar validation wouldn't suffice. We'd need a collection validation.

Take particular notice: testing the presence or absence of a datum falls under collection validation.

Validate a relationship between multiple scalars. Here's where the lines get blurry. If we'd like to know whether the second element of a vector is greater than the first…

(< (get [42 43] 0)
   (get [42 43] 1)) ;; => true

…or whether each successive value is double the previous value…

(def doubles [2 4 8 16 32 64 128 256 512])

(every? #(= 2 %) (map #(/ %2 %1) doubles (next doubles))) ;; => true

…it certainly looks at first glance that we're only interested in the values of the scalars. Where does the concept of a collection come into play? When validating the relationships between scalars, I imagine a double-ended arrow connecting the two scalars with a question floating above the arrow.

;;    greater-than?
[42 <---------------> 43]

Validating a relationship is validating the concept that arrow represents. The relationship arrow is not a fundamental property of a single, underlying scalar, so a scalar validation won't work. The relationship arrow 'lives' in the collection, so validating the relationship arrow requires a collection validation.

Don't feel forced to choose between scalar or collection validation. We could do both, reserving each kind of validation for when it's best suited.

Quick examples

The upcoming discussion is long and detailed, so before we dive in, let's look at a few examples of collection validation to give a little motivation to working through the concepts.

We can validate that a vector contains three elements. We compose that predicate like this.

(defn length-3? [v] (= 3 (count v)))

Then, we pull in one of Speculoos' collection validation functions, and validate. The data is the first argument, in the upper row, the specification is the second argument, appearing in the lower row.

(require '[speculoos.core :refer [valid-collections?]])


(valid-collections? [42 "abc" 22/7]   [length-3?]) ;; => true

That example reminds us to consider the Three Mottos. We're validating strictly only collections, as the valid-collections? function name indicates. The shape of the specification (lower row) mimics the shape of the data (upper row). One predicate paired with one collection, zero ignored.

We'll go into much more detail soon, but be aware that during collection validation, predicates apply to their immediate parent collection. So the length-3? predicate applies to the vector, not the scalar 42. The vector contains three elements, so the validation returns true.

We can validate whether a map contains a key :y. Here's a predicate for that.

(defn map-contains-keyword-y? [m] (contains? m :y))

Then we validate, data in the upper row, specification in the lower row. For the moment, don't worry about that :foo key in the specification.

(valid-collections? {:x 42}
                    {:foo map-contains-keyword-y?})
;; => false

Data {:x 42} does not contain a key :y, so the validation returns false.

We can validate whether a list contains an even number of elements. Here's that predicate.

(defn even-elements? [f] (even? (count f)))

Then the validation.

(valid-collections? (list < 1 2 3)
                    (list even-elements?))
;; => true

Yes, the list contains an even number of elements.

We could determine whether every number in a set is an odd. Here's a predicate to test that.

(defn all-odd? [s] (every? odd? s))

Collection validation looks like this.

(valid-collections? #{1 2 3}
                    #{all-odd?}) ;; => false

Our set, #{1 2 3}, contains one element that is not odd, so the set fails to satisfy the collection predicate.

None of those four examples could be accomplished with a scalar validation. They all require access to the collection itself.

Where collection predicates apply

The principle to keep in mind is Any collection predicate applies to its immediate parent collection. Let's break that down into parts.

  • Predicates apply to their parent collection.

    (valid-collections? [42 "abc" 22/7]
                        [length-3?]) ;; => true

    In contrast with scalar validation, which would have paired the predicate with integer 42, collection validation pairs predicate length-3? with the parent vector [42 "abc" 22/7]. In the next section, we'll discuss the mechanics of how and why it's that way.

  • Predicates apply to their immediate parent collection.

    (valid-collections? [[42 "abc" 22/7]]
                        [[length-3?]])
    ;; => true

    The length-3? predicate applies to the nested vector that contains three elements. The outer, root collection that contains only one element was not paired with a predicate, so it was ignored. Each predicate is paired with at most one collection, the collection closest to it.

  • Any collection predicates apply to their immediate parent collection.

    (valid-collections? [42 "abc" 22/7]
                        [vector? coll? sequential?])
    ;; => true

    Unlike scalar validation, which maintains an absolute one-to-one correspondence between predicates and datums, collection validation may include more than one predicate per collection. Collections may pair with more than one predicate, but each predicate pairs with at most one collection.

The fact that predicates apply to their immediate parent collections is what allows us to write specifications whose shape mimic the shapes of the data.

How collection validation works

The Three Mottos and the principle of applying predicates to their containing collections are emergent properties of Speculoos' collection validation algorithm. If we understand the algorithm, we will know when a collection validation is the best tool for the task, and be able to write clear, correct, and expressive collection specifications.

Imagine we wanted to specify that our data vector was exactly three elements long. We might reasonably write this predicate, whose argument is a collection.

;; a predicate that returns true if the collection has three elements

(defn len-3? [c] (= 3 (count c)))

Notice that this predicate tests a property of the collection itself: the number of elements it contains. validate-scalars has no way to do this kind of test because it deliberately only considers scalars.

Now, we invent some example data.

[42 "abc" 22/7]

The paths of that data look like this.

(all-paths [42 "abc" 22/7])
;; => [{:path [], :value [42 "abc" 22/7]}
;;     {:path [0], :value 42}
;;     {:path [1], :value "abc"}
;;     {:path [2], :value 22/7}]

We're validating collections (Motto #1), so we're only interested in the root collection at path [] in the data. Let's apply Motto #2 and shape our specification to mimic the shape of the data. We'll copy-paste the data…

[42 "abc" 22/7]

…delete the contents…

[             ]

…and replace the contents with our len-3? predicate.

[len-3?       ]

That will be our specification. Notice: during collection validation, we insert predicates inside the collection that they target.

Validating collections uses a slightly adjusted version of the scalar validation algorithm. (If you are curious why the collection algorithm is different, see this later subsection.) The algorithm for validating collections is as follows:

  1. Run all-paths on the data, then the specification.
  2. Remove scalar elements from the data, keeping only the collection elements.
  3. Remove non-predicate elements from the collection specification.
  4. Pair predicates at path pth in the specification with collections at path (drop-last pth) in the data. Discard all other un-paired collections and un-paired predicates.
  5. For each remaining collection+predicate pair, apply the predicate to the collection.

Let's perform that algorithm manually. We run all-paths on both the data…

(all-paths [42 "abc" 22/7])
;; => [{:path [], :value [42 "abc" 22/7]}
;;     {:path [0], :value 42}
;;     {:path [1], :value "abc"}
;;     {:path [2], :value 22/7}]

…and all-paths on our collection specification.

(all-paths [len-3?])
;; => [{:path [], :value [len-3?]}
;;     {:path [0], :value len-3?}]

We discard all scalar elements of the data, keeping only the collection elements.

[{:path [], :value [42 "abc" 22/7]}]

And we keep only the predicate elements of the specification.

[{:path [0], :value len-3?}]

The next step, pairing predicates to a target collection, is where it gets interesting. During scalar validation, we paired a predicate with a scalar when they shared the exact same path. That doesn't work for collection validation. Instead, we pair a collection and a predicate when the collection's path in the data is equivalent to (drop-last pth), where pth is the predicate's path in the specification.

Looking at the previous two results, we see the root collection is path [], while the len-3? predicate's path is [0]. (drop-last [0]) evaluates to (), which is equivalent to the root path. So the predicate and the collection are paired. We then apply the predicate.

(len-3? [42 "abc" 22/7]) ;; => true

The root collection [42 "abc" 22/7] satisfies the len-3? predicate because it contains three elements, so the validation returns true.

Speculoos provides a function, validate-collections, that does all that for us. The function signature is similar to what we saw earlier while validating scalars: data on the upper row, and the specification mimicking the shape of the data on the lower row.

(require '[speculoos.core :refer [validate-collections]])

(validate-collections [42 "abc" 22/7]   [len-3?]) ;; => ({:datum [42 "abc" 22/7], ;; :ordinal-path-datum [], ;; :path-datum [], ;; :path-predicate [0], ;; :predicate len-3?, ;; :valid? true})

Much of that looks familiar. validate-collections returns a validation entry for every collection+predicate pair. In this case, the data's root vector was paired with the single len-3?predicate. The :datum represents the thing being tested, the :predicates indicate the predicate functions, and valid? reports whether that predicate was satisfied. The root vector contains three elements, so len-3? was satisfied.

There are now three things that involve some notion of a path. The predicate was found at :path-predicate in the specification. The datum was found at :ordinal-path-datum in the data, which is also presented in a more friendly format as the literal path :path-datum. (We'll explain the terms embodied by these keywords as the discussion progresses.) Notice that the path of the root vector [] is equivalent to running drop-last on the path of the len-3? predicate: (drop-last [0]) evaluates to ().

Let's explore validating a two-element vector nested within a two-element vector. To test whether each of those two vectors contain two elements, we could write this collection predicate.

(defn len-2? [c] (= 2 (count c)))

Remember Motto #1: This predicate accepts a collection, c, not a scalar.

We'll invent some data, a two-element vector nested within a two-element vector by wrapping the final two elements inside an additional pair of brackets.

[42 ["abc" 22/7]]

Note that the outer root vector contains exactly two elements: one scalar 42 and one descendant collection, the nested vector ["abc" 22/7].

Following Motto #2, we'll compose a collection specification whose shape mimics the shape of the data. We copy-paste the data, delete the scalars, and insert our predicates.

[42     ["abc" 22/7]] ;; copy-paste data
[ [ ]] ;; delete scalars
[len-3? [len-2? ]] ;; insert predicates

(I've re-used the len-3? predicate so that in the following examples, it'll be easier to keep track of which predicate goes where when we have multiple predicates.)

Let's take a look at the data's paths.

(all-paths [42 ["abc" 22/7]])
;; => [{:path [], :value [42 ["abc" 22/7]]}
;;     {:path [0], :value 42}
;;     {:path [1], :value ["abc" 22/7]}
;;     {:path [1 0], :value "abc"}
;;     {:path [1 1], :value 22/7}]

Five elements: three scalars, which we ignore during collection validation, and two collections, the root collection and the nested vector.

Here are the specification's paths.

(all-paths [len-3? [len-2?]])
;; => [{:path [], :value [len-3? [len-2?]]}
;;     {:path [0], :value len-3?}
;;     {:path [1], :value [len-2?]}
;;     {:path [1 0], :value len-2?}]

Four total elements: two collections, the root vector at path [] and a nested vector at path [1], and two functions, predicate len-3? in the top-level at path [0] and predicate len-2? in the lower-level at path [1 0].

Next, we remove all scalar elements from the data, keeping only the elements that are collections.

;; non-scalar elements of data

[{:path [], :value [42 ["abc" 22/7]]}  {:path [1], :value ["abc" 22/7]}]

We kept two such collections: the root collection at path [] and the nested vector at path [1].

Next, we remove all non-predicate elements from the specification, keeping only the predicates.

;; predicate elements of specification

[{:path [0], :value len-3?}  {:path [1 0], :value len-2?}]

There are two such collection predicates: len-3? at path [0] and len-2? at path [1 0]. Let's notice that if we apply drop-last to those paths, we get the paths of the two vectors in the data:

  • (drop-last [0]) yields (), which pairs with the data's root collection [42 ["abc" 22/7]] at path [].
  • (drop-last [1 0]) yields (1), which pairs with the nested vector ["abc" 22/7] at path [1] of the data.

In the previous section when we were validating scalars, we followed the principle that validation only proceeds when a predicate in the specification shares the exact path as the scalar in the data. However, we can now see an issue if we try to apply that principle here. The nested vector of the data is located at path [1]. The nested len-2? predicate in the specification is located at path [1 0], nearly same except for the trailing 0. The root vector of the data is located at path [] while the len-3? predicate is located at path [0] of the specification, again, nearly the same except for the trailing 0. Clojure has a nice core function that performs that transformation.

The slightly modified rule for validating collections is Collection predicates in the specification are applied to the collection in the data that correspond to their parent. In other words, the predicate at path pth in the collection specification is applied to the collection at path (drop-last pth) in the data. So we pair predicate len-3? with the root collection [42 ["abc" 22/7]] and we pair predicate len-2? with the nested vector ["abc" 22/7].

We can now perform the validation by hand. There are two vectors to validate, each with its own predicate.

(len-3? [42 ["abc" 22/7]]) ;; => false
(len-2? ["abc" 22/7]) ;; => true

The root vector [42 ["abc" 22/7]] does not satisfy the len-3? predicate it was paired with because it only contains two elements (one integer plus one nested vector). The nested vector ["abc" 22/7] contains two elements, so it satisfies the len-2? collection predicate that it was paired with.

validate-collections does that entire algorithm for us with one invocation. Data on the upper row, collection specification on the lower row.

(validate-collections [42 ["abc" 22/7]]
                      [len-3? [len-2?]])
;; => ({:datum [42 ["abc" 22/7]],
;;      :ordinal-path-datum [],
;;      :path-datum [],
;;      :path-predicate [0],
;;      :predicate len-3?,
;;      :valid? false}
;;     {:datum ["abc" 22/7],
;;      :ordinal-path-datum [0],
;;      :path-datum [1],
;;      :path-predicate [1 0],
;;      :predicate len-2?,
;;      :valid? true})

One invocation performs the entire algorithm, which found two pairs of predicates+collections. Predicate len-3? at path [0] in the specification was paired with root collection at path [] in the data. The root collection contains only two elements, so len-3? returns false. Predicate len-2? at path [1 0] in the specification was paired with the nested vector at path [1] in the data. The nested vector contains two elements, so len-2? returns true.

To solidify our knowledge, let's do one more example with an additional nested vector and a third predicate. I'll be terse because this is just a review of the concepts from before.

The nested data, similar to the previous data, but with an additional vector wrapping 22/7.

[42 ["abc" [22/7]]]

A new predicate testing for a length of one.

(defn len-1? [c] (= 1 (count c)))

Motto #2: Shape the specification to mimic the data. Copy-paste the data, then delete the scalars.

[   [      [    ]]]

Insert collection predicates.

[len-3? [len-2? [len-1?]]]

Now that we have the data and specification in hand, we perform the collection validation algorithm.

  1. Run all-paths on the data.

    (all-paths [42 ["abc" [22/7]]])
    ;; => [{:path [], :value [42 ["abc" [22/7]]]}
    ;;     {:path [0], :value 42}
    ;;     {:path [1], :value ["abc" [22/7]]}
    ;;     {:path [1 0], :value "abc"}
    ;;     {:path [1 1], :value [22/7]}
    ;;     {:path [1 1 0], :value 22/7}]

    Six elements: three collections, three scalars (will be ignored).

    Run all-paths on the specification.

    (all-paths [len-3? [len-2? [len-1?]]])
    ;; => [{:path [], :value [len-3? [len-2? [len-1?]]]}
    ;;     {:path [0], :value len-3?}
    ;;     {:path [1], :value [len-2? [len-1?]]}
    ;;     {:path [1 0], :value len-2?}
    ;;     {:path [1 1], :value [len-1?]}
    ;;     {:path [1 1 0], :value len-1?}]

    Six elements: three collections, three predicates.

  2. Remove scalar elements from the data, keeping only the collection elements.

    [{:path [], :value [42 ["abc" [22/7]]]}
     {:path [1], :value ["abc" [22/7]]}
     {:path [1 1], :value [22/7]}]

    Remove non-predicate elements from the specification.

    [{:path [0], :value len-3?}
     {:path [1 0], :value len-2?}
     {:path [1 1 0], :value len-1?}]
  3. Pair predicates at path pth in the specification with collections at path (drop-last pth) in the data.

    ;; paths of predicates  => paths of collections in data

    (drop-last [0]) ;; => ()
    (drop-last [1 0]) ;; => (1)
    (drop-last [1 1 0]) ;; => (1 1)

    () is equivalent to [], (1) is equivalent to [1], etc. Therefore,

    • len-3? pairs with [42 ["abc" [22/7]]]
    • len-2? pairs with ["abc" [22/7]]
    • len-1? pairs with [22/2]

    All predicates pair with a collection, and all collections pair with a predicate. There are zero un-paired predicates, and zero un-paired collections.

  4. For each collection+predicate pair, apply the predicate.
    (len-3? [42 ["abc" [22/7]]]) ;; => false
    (len-2? ["abc" [22/7]]) ;; => true
    (len-1? [22/7]) ;; => true
  5. The root collection fails to satisfy its predicate, but the two nested vectors do satisfy their respective predicates.

Now, we lean on validate-collections to do all four steps of that algorithm with one invocation. Data on the upper row, specification on the lower row.

(validate-collections [42 ["abc" [22/7]]]
                      [len-3? [len-2? [len-1?]]])
;; => ({:datum [42 ["abc" [22/7]]],
;;      :ordinal-path-datum [],
;;      :path-datum [],
;;      :path-predicate [0],
;;      :predicate len-3?,
;;      :valid? false}
;;     {:datum ["abc" [22/7]],
;;      :ordinal-path-datum [0],
;;      :path-datum [1],
;;      :path-predicate [1 0],
;;      :predicate len-2?,
;;      :valid? true}
;;     {:datum [22/7],
;;      :ordinal-path-datum [0 0],
;;      :path-datum [1 1],
;;      :path-predicate [1 1 0],
;;      :predicate len-1?,
;;      :valid? true})

validate-collections discovered the same three collection+predicate pairs and helpfully reports their paths, alongside the results of applying each of the three predicates to their respective collections. As we saw when we ran the manual validation, the root collection failed to satisfy its len-3? predicate, but the two nested vectors did satisfy their predicates, len-2? and len-1?, respectively.

Next we'll tackle validating the collection properties of maps. The same principle governs: predicates apply to their parent container. Let's assume this data.

{:x 42}

A hash-map containing one key-value. Here are the paths of that example data.

(all-paths {:x 42})
;; => [{:path [], :value {:x 42}}   {:path [:x], :value 42}]

One scalar, which validate-collections ignores, and one collection. Let's apply our rule: the predicate in the specification applies to the collection in the data whose path is one element shorter. The root collection is located at path []. To write a collection specification, we'd mimic the shape of the data, inserting predicates that apply to the parent. We can't simply write…

{map?} ;; => java.lang.RuntimeException...

…because maps must contain an even number of forms. So we're going to need to add a key in there. Let me propose this as a specification.

{:foo map?}

:foo doesn't have any particular meaning, and it won't affect the validation. Let's examine the paths of that proposed specification and apply the Mottos.

(all-paths {:foo map?})
;; => [{:path [], :value {:foo map?}}
;;     {:path [:foo], :value map?}]

Two elements: the root collection at path [] and a predicate at path [:foo]. Since this will be the collection validation, Speculoos only considers the elements of the specification which are predicates, so non-predicate elements of the specification (i.e., the root collection) will be ignored, and only the map? predicate will participate, if it can be paired with a collection in the data.

Let's explore the drop-last business. There's only one element in the collection specification that's a predicate. Predicate map? is located at path [:foo].

(drop-last [:foo]) ;; => ()

Fortunately, that evaluates to a path, (), which in the data, corresponds to a collection. Because the (drop-last [:foo]) path of the predicate in the specification corresponds to the path of a collection in the data, we can form a validation pair.

(map? {:x 42}) ;; => true

The root collection satisfies the map? predicate it is paired with.

Let's do that sequence automatically with validate-collections, data on the upper row, specification on the lower row.

(validate-collections {:x 42}
                      {:foo map?})
;; => ({:datum {:x 42},
;;      :ordinal-path-datum [],
;;      :path-datum [],
;;      :path-predicate [:foo],
;;      :predicate map?,
;;      :valid? true})

validate-collections was, in fact, able to pair one predicate to one collection. Predicate map? at path [:foo] in the specification was paired with the root collection at path []. Unlike scalar validation which pairs predicates to scalars with their exact paths, collection validation pairs are formed when the target path is equivalent to the predicate's path right-trimmed. In this example, predicate map?'s path is [:foo]. (drop-last [:foo]) evaluates to (). A path () corresponds to the root collection, so the predicate map? was applied to the root collection. {:x 42} satisfies the predicate.

Because of the drop-last behavior, it mostly doesn't matter what key we associate our collection predicate. The key will merely get trimmed when searching for a target. In the example above, :foo was trimmed, but the key could be anything. Observe.

(drop-last [:foo]) ;; => ()
(drop-last [:bar]) ;; => ()
(drop-last [:baz]) ;; => ()

Any single key would get trimmed off, resulting in a path of [], which would always point to the root collection.

Technically, we could key our collection predicates however we want, but I strongly recommend choosing a key that doesn't appear in the data. This next example shows why.

Let's explore a map nested within a map. This will be our example data.

{:x 42 :y {:z "abc"}

Let's put a collection predicate at key :y of the specification.

{:y map?}

Notice that :y also appears in the data.

Now we run all-paths on both the data…

(all-paths {:x 42, :y {:z "abc"}})
;; => [{:path [], :value {:x 42, :y {:z "abc"}}}
;;     {:path [:x], :value 42}
;;     {:path [:y], :value {:z "abc"}}
;;     {:path [:y :z], :value "abc"}]

…and all-paths on the specification.

(all-paths {:y map?})
;; => [{:path [], :value {:y map?}}
;;     {:path [:y], :value map?}]

Discard all non-collection elements in the data…

[{:path [],:value {:x 42, :y {:z "abc"}}}
 {:path [:y], :value {:z "abc"}}]

…and discard all non-predicate elements in the specification.

[{:path [:y], :value map?}]

This is something we see for the first time while discussing collection validation: Fewer predicates than collections. Since there is only one predicate, at least one collection will be un-paired, and ignored (Motto #3). In this example, the predicate's path is [:y]. We trim it with drop-last.

(drop-last [:y]) ;; => ()

The resulting () corresponds to path [] in the data. So we can now apply the collection predicate to the collection in the data.

(map? {:x 42, :y {:z "abc"}}) ;; => true

Let's confirm that we produced the same answer as validate-collections would give us.

(validate-collections {:x 42, :y {:z "abc"}}
                      {:y map?})
;; => ({:datum {:x 42, :y {:z "abc"}},
;;      :ordinal-path-datum [],
;;      :path-datum [],
;;      :path-predicate [:y],
;;      :predicate map?,
;;      :valid? true})

We see a map? predicate at key :y of the specification, and validate-collections merrily chugged along without a peep about masking the nested map {:z "abc"}.

We can see that the singular map? predicate located at specification path [:y] was indeed applied to the root container at data path (drop-last [:y]) which evaluates to path []. But now we've consumed that key, and it cannot be used to target the nested map {:z "abc"} at path [:y] in the data. We would not be able to validate any aspect of the nested collection {:z "abc"}.

Instead, if we had invented a wholly fictitious key, drop-last would trim that sham key off the right end of the path and the predicate would still be applied to the root container, while key :y remains available to target the nested map. :foo/:bar/:baz-style keywords are nice because humans understand that they don't carry any particular meaning. In practice, I like to invent keys that are descriptive of the predicate so the validation results are easier to scan by eye.

For instance, if we're validating that a collection's type is a map, we could use sham key :is-a-map?. We could also verify that the nested map is not a set by associating predicate set? to :is-a-set?.

(validate-collections {:x 42, :y {:z "abc"}}
                      {:is-a-map? map?, :y {:is-a-set? set?}})
;; => ({:datum {:x 42, :y {:z "abc"}},
;;      :ordinal-path-datum [],
;;      :path-datum [],
;;      :path-predicate [:is-a-map?],
;;      :predicate map?,
;;      :valid? true}
;;     {:datum {:z "abc"},
;;      :ordinal-path-datum [:y],
;;      :path-datum [:y],
;;      :path-predicate [:y :is-a-set?],
;;      :predicate set?,
;;      :valid? false})

Notice that validate-collections completely ignored the scalars 42 and "abc" at data keys :x and :z. It only applied predicate map? to the root of data and predicate set? to the nested map at key :y, which failed to satisfy. Any possible meaning suggested by keys :is-a-map? and :is-a-set? did not affect the actual validation; they are merely convenient markers that we chose to make the results easier to read.

Let me emphasize: when we're talking about a nested map's collection specification, the predicate's key has absolutely no bearing on the operation of the validation. The key, at the tail position of the path, gets trimmed by the drop-last operation. That's why :foo in the earlier examples doesn't need to convey any meaning. We could have made the key misleading like this.

;;             this keyword… ---v         v--- …gives the wrong impression about this predicate
(validate-collections {:x 11} {:is-a-map? vector?}) ;; => ({:datum {:x 11}, ;; :ordinal-path-datum [], ;; :path-datum [], ;; :path-predicate [:is-a-map?], ;; :predicate vector?, ;; :valid? false})

Despite the :is-a-map? key suggesting that we're testing for a map, the predicate itself determines the outcome of the validation. The vector? predicate tests for a vector, and returns false.

It's our job to make sure we write the predicates correctly.

Here's something interesting.

(validate-collections [42]
                      [vector? map?])
;; => ({:datum [42],
;;      :ordinal-path-datum [],
;;      :path-datum [],
;;      :path-predicate [0],
;;      :predicate vector?,
;;      :valid? true}
;;     {:datum [42],
;;      :ordinal-path-datum [],
;;      :path-datum [],
;;      :path-predicate [1],
;;      :predicate map?,
;;      :valid? false})

If we focus on the paths of the two predicates in the specification, we see that both vector? at path [0] and map? at path [1] target the root container because…

(drop-last [0]) ;; => ()

…and…

(drop-last [1]) ;; => ()

…and in fact…

(drop-last [99999]) ;; => ()
…all evaluate to the same equivalent path [] in the data. So we have another consideration: Every predicate in a specification's collection applies to the parent collection in the data. This means that we can apply an unlimited number of predicates to each collection.
(validate-collections [42]
                      [vector? map? list? set? coll?])
;; => ({:datum [42],
;;      :ordinal-path-datum [],
;;      :path-datum [],
;;      :path-predicate [0],
;;      :predicate vector?,
;;      :valid? true}
;;     {:datum [42],
;;      :ordinal-path-datum [],
;;      :path-datum [],
;;      :path-predicate [1],
;;      :predicate map?,
;;      :valid? false}
;;     {:datum [42],
;;      :ordinal-path-datum [],
;;      :path-datum [],
;;      :path-predicate [2],
;;      :predicate list?,
;;      :valid? false}
;;     {:datum [42],
;;      :ordinal-path-datum [],
;;      :path-datum [],
;;      :path-predicate [3],
;;      :predicate set?,
;;      :valid? false}
;;     {:datum [42],
;;      :ordinal-path-datum [],
;;      :path-datum [],
;;      :path-predicate [4],
;;      :predicate coll?,
;;      :valid? true})

All five collection predicates were located at a single-element path, so for each of those five cases, (drop-last [0 through 4]) evaluated to (), which is the path to the data's root collection. validate-collections was therefore able to make five pairs, and we see five validation results.

That feature can be useful, but it raises an issue. How would we specify the collections of this data?

[42 {:y "abc"}]

A map nested within a vector. And its paths.

(all-paths [42 {:y "abc"}])
;; => [{:path [], :value [42 {:y "abc"}]}
;;     {:path [0], :value 42}
;;     {:path [1], :value {:y "abc"}}
;;     {:path [1 :y], :value "abc"}]

We may want to specify two facets of the root collection, that it's both a collection and a vector (that's redundant, I know). Furthermore, we want to specify that the data's second element is a map. That collection specification might look something like this.

[coll? vector? {:foo map?}]

And its paths.

(all-paths [coll? vector? {:foo map?}])
;; => [{:path [],
;;      :value [coll? vector? {:foo map?}]}
;;     {:path [0], :value coll?}
;;     {:path [1], :value vector?}
;;     {:path [2], :value {:foo map?}}
;;     {:path [2 :foo], :value map?}]

Two predicates, coll? and vector?, apply to the root collection, because (drop-last [0]) and (drop-last [1]) both resolve the root collection's path. But somehow, we have to tell validate-collections how to target that map? predicate towards the nested map. We can see that map? is located at path [2 :foo], and (drop-last [2 :foo]) evaluates to [2]. The data's nested map {:y "abc"} is located at path [1], which doesn't 'match'.

If any number of predicates apply to the parent collection, there might be zero to infinity predicates before we encounter a nested collection in that sequence. How, then, does validate-collections determine where to apply the predicate inside a nested collection?

The rule validate-collections follows is Within a sequential collection, apply nested collection predicates in the order which they appear, ignoring scalars. Let's see that in action. Here is the data, with the scalars removed from the root level.

[{:y "abc"}]

Here is the specification with the scalar (i.e., functions) removed from its root level.

[{:foo map?}]

Now we generate the paths for both of those.

;; pruned data

(all-paths [{:y "abc"}]) ;; => [{:path [], :value [{:y "abc"}]} ;; {:path [0], :value {:y "abc"}} ;; {:path [0 :y], :value "abc"}]


;; pruned specification

(all-paths [{:foo map?}]) ;; => [{:path [], :value [{:foo map?}]} ;; {:path [0], :value {:foo map?}} ;; {:path [0 :foo], :value map?}]

Next remove all non-collection elements from the data.

[{:path [], :value [{:y "abc"}]}
 {:path [0], :value {:y "abc"}}]

And remove all non-predicate elements of the specification.

[{:path [0 :foo], :value map?}]

There are two remaining collections in the data, but only one predicate. Motto #3 reminds us that at least one of the collections will be ignored. Can we make at least one collection+predicate pair? Let's perform the drop-last maneuver on the predicate's path.

(drop-last [0 :foo]) ;; => (0)

Well, how about that? That resolves to (0), which is equivalent to the path of the nested map {:y "abc"} in the pruned data. We can apply that predicate to the collection we paired it with.

(map? {:y "abc"}) ;; => true

So the nested map is indeed a map. Let's see what validate-collections has to say.

(validate-collections [42 {:y "abc"}]
                      [coll? vector? {:foo map?}])
;; => ({:datum [42 {:y "abc"}],
;;      :ordinal-path-datum [],
;;      :path-datum [],
;;      :path-predicate [0],
;;      :predicate coll?,
;;      :valid? true}
;;     {:datum [42 {:y "abc"}],
;;      :ordinal-path-datum [],
;;      :path-datum [],
;;      :path-predicate [1],
;;      :predicate vector?,
;;      :valid? true}
;;     {:datum {:y "abc"},
;;      :ordinal-path-datum [0],
;;      :path-datum [1],
;;      :path-predicate [2 :foo],
;;      :predicate map?,
;;      :valid? true})

validate-collections found three predicates in the specification on the lower row that it could pair with a collection in the data in the upper row. Both coll? and vector? predicates pair with the root collection because their paths, when right-trimmed with drop-last correspond to [], which targets the root collection. Predicate map? was paired with the nested map {:y "abc"} in the data because map? was located in the first nested collection of the specification, and {:y "abc"} is the first (and only) nested collection in the data. We can see how validate-collections calculated the nested map's path because :ordinal-path-datum is [0]. The ordinal path reports the path into the 'pruned' collections, as if the sequentials in the data and the sequentials in the specification contained zero scalars.

Let's do another example that really exercises this principle. First, we'll make some example data composed of a parent vector, containing a nested map, a nested list, and a nested set, with a couple of interleaved integers.

[{:a 11} 22 (list 33) 44 #{55}]

Let's examine the paths of that data.

(all-paths [{:a 11} 22 (list 33) 44 #{55}])
;; => [{:path [], :value [{:a 11} 22 (33) 44 #{55}]}
;;     {:path [0], :value {:a 11}}
;;     {:path [0 :a], :value 11}
;;     {:path [1], :value 22}
;;     {:path [2], :value (33)}
;;     {:path [2 0], :value 33}
;;     {:path [3], :value 44}
;;     {:path [4], :value #{55}}
;;     {:path [4 55], :value 55}]

We got path elements for five scalars, and path elements for four collections: the root collection (a vector), and three nested collections (one each of map, list, and set).

We're in collection validation mindset (Motto #1), so we ought to be considering the order of the nested collections. Let's eliminate the five scalars and enumerate the paths of the pruned data.

(all-paths [{} (list) #{}])
;; => [{:path [], :value [{} () #{}]}
;;     {:path [0], :value {}}
;;     {:path [1], :value ()}
;;     {:path [2], :value #{}}]

Let's make note of a few facts. The nested map, nested list, and nested set remain in the same relative order as in the full data. The root collection is, as always, at path []. The nested collections are zero-indexed: the nested map is located at index 0, the nested list is at index 1, and the nested set is at index 2. These indexes are what validate-collections reports as :ordinal-path-datum, the prefix ordinal indicating a position within a sequence, 'first', 'second', 'third', etc.

Now we need to compose a collection specification. Motto #2 reminds us to make the specification mimic the shape of the data. I'm going to copy-paste the data and mash the delete key to remove the scalar datums.

[{     }    (       )    #{  }]

Just to emphasize how they align, here are the data (upper row) and the collection specification (lower row) with some space for visual formatting.

[{:a 11} 22 (list 33) 44 #{55}] ;; <--- data
[{ } ( ) #{ }] ;; <--- collection specification
^--- 1st ^--- 2nd ^--- 3rd nested collection

The first thing to note is that our collection specification looks a lot like our data with all the scalars removed. The second thing to notice is that even though it contains zero predicates, that empty structure in the lower row is a legitimate collection specification which validate-collections can consume. Check this out.

(validate-collections [{:a 11} 22 (list 33) 44 #{55}]
                      [{} () #{}])
;; => ()

Motto #3: Validation ignores collections in the data that are not paired with a predicate in the specification. Zero predicates, zero pairs.

Okay, let's add one predicate. Let's specify that the second nested collection is a list. Predicates apply to their parent container, so we'll insert list? into the list of the specification (lower row).

(validate-collections [{:a 11} 22 (list 33) 44 #{55}]
                      [{} (list list?) #{}])
;; => ({:datum (33),
;;      :ordinal-path-datum [1],
;;      :path-datum [2],
;;      :path-predicate [1 0],
;;      :predicate list?,
;;      :valid? true})

One predicate in the specification pairs with one collection in the data, so we receive one validation result. That nested collection is indeed a list, so :valid? is true. The list? predicate at path [1 0] in the specification was applied to the collection located at path [2] in the data.

Notice how validate-collections did some tedious and boring calculations to achieve the general effect of The predicate in the second nested collection of the specification applies to the second nested collection of the data. It kinda skipped over that 22 because it ignores scalars, and we're validating collections. Basically, validate-collections performed that 'skip' by pruning the scalars from the data…

[{} (list) #{}]

…and pruning all non-collections from the parent level above the predicate. In other words, validate-collections pruned from the specification any scalars with a path length exactly one element shorter than the path of the predicate.

[{} (list list?) #{}] ;; no pruning because zero scalars within the parent's level

Then, enumerating the paths for both pruned data and pruned specification.

;; data, all scalars pruned

(all-paths [{} (list) #{}]) ;; => [{:path [], :value [{} () #{}]} ;; {:path [0], :value {}} ;; {:path [1], :value ()} ;; {:path [2], :value #{}}]


;; specification, parent-level pruned of non-collections

(all-paths [{} (list list?) #{}]) ;; => [{:path [], :value [{} (list?) #{}]} ;; {:path [0], :value {}} ;; {:path [1], :value (list?)} ;; {:path [1 0], :value list?} ;; {:path [2], :value #{}}]

There's only one predicate, list?, which is located at path [1 0] in the pruned specification. Right-trimming the predicate's path give us this.

(drop-last [1 0]) ;; => (1)

That right-trimmed result is equivalent to ordinal path [1] which is the second element of the pruned data. There is indeed a collection at ordinal path [1] of the pruned data, so we have successfully formed a pair. We can apply the predicate to the thing at that path.

(list? (list 33)) ;; => true

The list in the data indeed satisfies the list? predicate. validate-collections does all that with one invocation: pairs up predicates in the specification with nested collections in the data, and applies all predicates to their paired targets.

Let's see, again, how validate-collections handles this validation.

(validate-collections [{:a 11} 22 (list 33) 44 #{55}]
                      [{} (list list?) #{}])
;; => ({:datum (33),
;;      :ordinal-path-datum [1],
;;      :path-datum [2],
;;      :path-predicate [1 0],
;;      :predicate list?,
;;      :valid? true})

We inserted only a single list? predicate into the specification, so, at most, we could receive only one collection+predicate pair. The data's nested list, (list 33) is the second nested collection within the sequential, so its ordinal path is [1]. The list? predicate is contained in the specification's second nested collection, so its ordinal path is also [1]. Since the list? predicate's container and the thing in the data share an ordinal path, validate-collection formed a collection+predicate pair. The list? predicate was satisfied because (list 33) is indeed a list.

Let's clear the slate and specify that nested set at the end. We start with the full data…

[{:a 11} 22 (list 33) 44 #{55}]

…and prune all non-scalars from data to serve as a template for the specification…

[{     }    (list   )    #{  }]

…and insert a set? predicate for the set. Collection predicates apply to their parent containers, so we'll insert it inside the set we want to validate.

[{} (list) #{set?}]

Usually, we wouldn't include non-predicates into the specification, but for demonstration purposes, I'm going to insert a couple of scalars, keywords :skip-1 and :skip-2, that will ultimately get skipped because validation ignores non-predicates in the specification.

[{} :skip-1 (list) :skip-2 #{set?}]

First, we prune the non-collections from the data…

[{} (list) #{}]

…then prune from the specification the non-predicates from the parent-level.

[{} (list) #{set?}]

That rids us of :skip-1 and :skip-2, so now the nested collections in the specification align with the nested collections in the data.

We enumerate the paths of the pruned data…

(all-paths [{} (list) #{}])
;; => [{:path [], :value [{} () #{}]}
;;     {:path [0], :value {}}
;;     {:path [1], :value ()}
;;     {:path [2], :value #{}}]

…and enumerate the paths of the pruned specification.

(all-paths [{} (list) #{set?}])
;; => [{:path [], :value [{} () #{set?}]}
;;     {:path [0], :value {}}
;;     {:path [1], :value ()}
;;     {:path [2], :value #{set?}}
;;     {:path [2 set?], :value set?}]

There is only one predicate, specification element {:path [2 set?], :value set?}. When we right-trim that path, which we calculated with respect to the pruned specification, we get…

(drop-last [2 :set?]) ;; => (2)

…which is equivalent to ordinal path [2] with respect to the pruned data. The element at that path in the data is indeed a collection, so we successfully paired the predicate with a collection. Validation proceeds by applying the predicate to the element.

(set? #{55}) ;; => true

The element is indeed a set, so the predicate is satisfied.

Here's how we validate that nested set using validate-collections, data upper row, specification lower row.

(validate-collections [{:a 11} 22 (list 33) 44 #{55}]
                      [{} :skip-1 () :skip-2 #{set?}])
;; => ({:datum #{55},
;;      :ordinal-path-datum [2],
;;      :path-datum [4],
;;      :path-predicate [4 set?],
;;      :predicate set?,
;;      :valid? true})

One predicate applied to one collection, one validation result. And again, collection validation skipped right over the intervening scalars, 22 and 44, in the data, and over the intervening non-predicates, :skip-1 and :skip-2, in the specification. validate-collections applied the set? predicate in the specification's third nested collection to the data's third nested collection #{55}, both at ordinal path [2] (i.e., the third non-scalar elements).

We might as well specify and validate that nested map now. Here's our data again.

[{:a 11} 22 (list 33) 44 #{55}]

We remove all non-scalars to create a template for the specification.

[{     }    (list   )    #{  }]

Recall that collection predicates targeting a map require a sham key. We'll insert into the specification a map? predicate associated to a sham key,:is-map?, that doesn't appear in the data's corresponding nested map.

[{:is-map? map?}    (list   )    #{  }]

And again, just to demonstrate how the skipping works, I'll insert a couple of non-predicates in front of the nested map.

[:skip-3 :skip-4 {:is-map? map?}    (list   )    #{  }]

Note that the data's nested map is located at path [0], the first element, while, because of those to non-predicates, the specification's corresponding nested map is located at path [2], the third element. In a moment, matching the ordinal paths of each (by 'pruning') will cause them to be paired.

Now, we prune the non-scalars from the data…

[{} () #{}]

…and prune the non-predicates from the parent-level.

[{:is-map? map?} () #{}]

We enumerate the paths of the pruned data…

(all-paths [{} () #{}])
;; => [{:path [], :value [{} () #{}]}
;;     {:path [0], :value {}}
;;     {:path [1], :value ()}
;;     {:path [2], :value #{}}]

…and enumerate the paths of the pruned specification.

(all-paths [{:is-map? map?} () #{}])
;; => [{:path [], :value [{:is-map? map?} () #{}]}
;;     {:path [0], :value {:is-map? map?}}
;;     {:path [0 :is-map?], :value map?}
;;     {:path [1], :value ()}
;;     {:path [2], :value #{}}]

There is only the one predicate, map?, which is located at path [0 :is-map?] in the pruned specification. We right-trim that path.

(drop-last [0 :is-map?]) ;; => (0)

It turns out that there is, in fact, a collection at that ordinal path of the pruned data, so we've made a collection+predicate pairing. We apply the predicate to that collection element.

(map? {:a 11}) ;; => true

The nested collection at ordinal path [0], the first nested collection, in the pruned data satisfies the predicate map? located at ordinal path [0] in the pruned specification.

validate-collections does all that work for us. Upper row, data; lower row, specification.

(validate-collections [{:a 11} 22 (list 33) 44 #{55}]
                      [:skip-3 :skip-4 {:is-map? map?} () #{}])
;; => ({:datum {:a 11},
;;      :ordinal-path-datum [0],
;;      :path-datum [0],
;;      :path-predicate [2 :is-map?],
;;      :predicate map?,
;;      :valid? true})

Unlike the previous two validations, validate-collections didn't have to skip over any scalars in the data because the nested map is the first element. It did, however, have to skip over two non-predicates, :skip-3 and :skip-4, in the specification. It applied the predicate in the specification's first nested collection to the data's first nested collection (both at ordinal path [0], i.e., the first non-scalar element), which is indeed a map.

We've now seen how to specify and validate each of those three nested collections, so for completeness' sake, let's specify the root. Predicates apply to their container, so for clarity, we'll insert it at the beginning.

(validate-collections [{:a 11} 22 (list 33) 44 #{55}]
                      [vector? {} () #{}])
;; => ({:datum [{:a 11} 22 (33) 44 #{55}],
;;      :ordinal-path-datum [],
;;      :path-datum [],
;;      :path-predicate [0],
;;      :predicate vector?,
;;      :valid? true})

Technically, we could put that particular predicate anywhere in the top-level vector as long (drop-last path) evaluates to []. All the following yield substantially the same results.

(validate-collections [{:a 11} 22 (list 33) 44 #{55}] [vector? {} () #{}])
(validate-collections [{:a 11} 22 (list 33) 44 #{55}] [{} vector? () #{}])
(validate-collections [{:a 11} 22 (list 33) 44 #{55}] [{} () vector? #{}])
(validate-collections [{:a 11} 22 (list 33) 44 #{55}] [{} () #{} vector?])

In practice, I find it visually clearer to insert the predicates at the head of a sequential.

Let's do one final, all-up demonstration where we validate all four collections, the root collection containing three nested collections. Once again, here's the data.

[{:a 11} 22 (list 33) 44 #{55}]

We copy-paste the data and delete all scalars to create a template for the specification.

[{     }    (list   )    #{  }]

Now we insert the predicates. The rule is Predicates apply to the collection that contains the predicate. So we insert a set? predicate into the set…

[{     }    (list   )    #{set?}]

…insert a list? predicate into the list…

[{     }    (list list?)    #{set?}]

…insert a map? predicate into the map, associated to sham key :foo

[{:foo map?}    (list list?)    #{set?}]

…and insert a vector? predicate, a sequential? predicate, a sequential? predicate, and an any? predicate into the vector's top level.

[vector? {:foo map?} sequential? (list list?) coll? #{set?} any?]

There will be two 'phases', each phase pruning a different level. The first phase validates the root collection with the top-level predicates. To start, we enumerate the paths of the data…

(all-paths [{:a 11} 22 (list 33) 44 #{55}])
;; => [{:path [], :value [{:a 11} 22 (33) 44 #{55}]}
;;     {:path [0], :value {:a 11}}
;;     {:path [0 :a], :value 11}
;;     {:path [1], :value 22}
;;     {:path [2], :value (33)}
;;     {:path [2 0], :value 33}
;;     {:path [3], :value 44}
;;     {:path [4], :value #{55}}
;;     {:path [4 55], :value 55}]

…and enumerate the paths of our specification.

(all-paths [vector? {:foo map?} sequential? (list list?) coll? #{set?} any?])
;; => [{:path [], :value [vector? {:foo map?} sequential? (list?) coll? #{set} any?]} ;; {:path [0], :value vector?} ;; {:path [1], :value {:foo map?}} ;; {:path [1 :foo], :value map?} ;; {:path [2], :value sequential?} ;; {:path [3], :value (list?)} ;; {:path [3 0], :value list?} ;; {:path [4], :value coll?} ;; {:path [5], :value #{set?}} ;; {:path [5 set?], :value set?} ;; {:path [6], :value any?]

Then, we keep only elements that a) are predicates and b) have a single-element path.

[{:path [0], :value vector?}
 {:path [2], :value sequential?}
 {:path [4], :value coll?}
 {:path [6], :value any?}]

In this first phase, we're focusing on predicates located at single-element paths, because (drop-last [i]) will, for every i, resolve to [], which targets the root collection. We see from that last step, predicates vector?, sequential?, coll?, and any? all have single-element paths, so they will target the root collection. The conceptual linkage between a predicate's right-trimmed path and its target has the practical result that predicates apply to their parent containers. So we right-trim those paths.

(drop-last [0]) ;; => ()
(drop-last [2]) ;; => ()
(drop-last [4]) ;; => ()
(drop-last [6]) ;; => ()

They all evaluate to (), which is equivalent to [], the path to the root collection. So we may now apply all four predicates to the root collection.

(vector? [{:a 11} 22 (list 33) 44 #{55}]) ;; => true
(sequential? [{:a 11} 22 (list 33) 44 #{55}]) ;; => true
(coll? [{:a 11} 22 (list 33) 44 #{55}]) ;; => true
(any? [{:a 11} 22 (list 33) 44 #{55}]) ;; => true

Now that we've applied all predicates in the top level to the root collection, the first phase is complete. The second phase involves validating the nested collections. We start the second phase with the original data…

[{:a 11} 22 (list 33) 44 #{55}]

…and the original specification.

[vector? {:foo map?} sequential? (list list?) coll? #{set?} any?]

We remove the scalars from the data…

[{     }    (       )    #{  }]

…and from the specification, we keep only the second-level predicates, i.e., the predicates contained in the nested collections.

[{:foo map?} (list list?) #{set?}]

Next, we enumerate the paths of the pruned data…

(all-paths [{} () #{}])
;; => [{:path [], :value [{} () #{}]}
;;     {:path [0], :value {}}
;;     {:path [1], :value ()}
;;     {:path [2], :value #{}}]

…and enumerate the paths of the pruned specification.

(all-paths [{:foo map?} (list list?) #{set?}])
;; => [{:path [], :value [{:foo map?} (list?) #{set?}]}
;;     {:path [0], :value {:foo map?}}
;;     {:path [0 :foo], :value map?}
;;     {:path [1], :value (list?)}
;;     {:path [1 0], :value list?}
;;     {:path [2], :value #{set?}}
;;     {:path [2 set?], :value set?}]

Next, we retain the path elements of the data's second-level collections only…

[{:path [0], :value {}}
 {:path [1], :value ()}
 {:path [2], :value #{}}]

…and retain only the predicates of the pruned specification, which in this phase are only in the nested collections.

[{:path [0 :foo], :value map?}
 {:path [1 0], :value list?}
 {:path [2 set?], :value set?}]

Now we run the trim-right operation on the predicate paths.

(drop-last [0 :foo]) ;; => (0)
(drop-last [1 0]) ;; => (1)
(drop-last [2 set?]) ;; => (2)

Then we try to form predicate+collection pairs. From top to bottom:

  • Predicate map? at [0 :foo] pairs with the data element at path
    (drop-last [0 :foo]) ;; => (0)
    which resolves to the nested map {:a 11}.
  • Predicate list? at [1 0] pairs with the data element at path
    (drop-last [1 0]) ;; => (1)
    which resolves to the nested list (list 33).
  • Predicate set? at [2 set?] pairs with the data element at path
    (drop-last [2 set?]) ;; => (2)
    which resolves to the nested set #{55}.

We can finally apply each of those three predicates towards their respective target collections.

(map? {:a 11}) ;; => true
(list? (list? 33)) ;; => false
(set? #{55}) ;; => true

Combining the two phases, we have seven total predicate+collections pairs, four in the top level, one in each of the three nested collections. All predicates were satisfied.

Now that we've manually done that collection validation, let's see how validate-collections compares.

(validate-collections [{:a 11} 22 (list 33) 44 #{55}]
                      [vector? {:foo map?} sequential? (list list?) coll? #{set?} any?])
;; => ({:datum [{:a 11} 22 (33) 44 #{55}],
;;      :ordinal-path-datum [],
;;      :path-datum [],
;;      :path-predicate [0],
;;      :predicate vector?,
;;      :valid? true}
;;     {:datum {:a 11},
;;      :ordinal-path-datum [0],
;;      :path-datum [0],
;;      :path-predicate [1 :foo],
;;      :predicate map?,
;;      :valid? true}
;;     {:datum [{:a 11} 22 (33) 44 #{55}],
;;      :ordinal-path-datum [],
;;      :path-datum [],
;;      :path-predicate [2],
;;      :predicate sequential?,
;;      :valid? true}
;;     {:datum (33),
;;      :ordinal-path-datum [1],
;;      :path-datum [2],
;;      :path-predicate [3 0],
;;      :predicate list?,
;;      :valid? true}
;;     {:datum [{:a 11} 22 (33) 44 #{55}],
;;      :ordinal-path-datum [],
;;      :path-datum [],
;;      :path-predicate [4],
;;      :predicate coll?,
;;      :valid? true}
;;     {:datum #{55},
;;      :ordinal-path-datum [2],
;;      :path-datum [4],
;;      :path-predicate [5 set?],
;;      :predicate set?,
;;      :valid? true}
;;     {:datum [{:a 11} 22 (33) 44 #{55}],
;;      :ordinal-path-datum [],
;;      :path-datum [],
;;      :path-predicate [6],
;;      :predicate any?,
;;      :valid? true})

We inserted four predicates — vector?, sequential?, coll?, and any? — directly into the specification's top level, interleaved among the nested map, list, and set. Because they're in the top level, those predicates apply to the collection that contains them, the root collection. The outer, parent vector satisfies all four predicates because it is indeed a vector, is sequential, is a collection, and it trivially satisfies any?.

In addition, validate-collections validated the data's three nested collections, each with the particular predicate they contained. Map {:a 11} is the first nested collection, so its map? predicate is found at ordinal path [0]. List (list 33)is the second nested collection, so its list? predicate is found at ordinal path [1], skipping over the intervening scalar 22. Set #{55} is the third nested collection, paired with the set? predicate at ordinal path [2], skipping over the intervening scalars 22 and 44. All three nested collections satisfied their respective predicates.

Collections nested within a map do not involve that kind of skipping because they're not sequential. To demonstrate that, let's make this our example data.

{:a [99] :b (list 77)}

Now, we copy-paste the data, then delete the scalars.

{:a [  ] :b (list   )}

That becomes the template for our collection specification. Let's pretend we want to specify something about those two nested collections at keys :a and :b. We stuff the predicates directly inside those collections. During a collection validation, predicates apply to the collection that contains them.

{:a [vector?] :b (list list?)}

This becomes our collection specification. For now, we've only specified one property for each of the two nested collections. We haven't stated any requirement of the root collection, the outer map.

Let's validate with validate-collections.

(validate-collections {:a [99], :b (list 77)}
                      {:a [vector?], :b (list list?)})
;; => ({:datum [99],
;;      :ordinal-path-datum [:a],
;;      :path-datum [:a],
;;      :path-predicate [:a 0],
;;      :predicate vector?,
;;      :valid? true}
;;     {:datum (77),
;;      :ordinal-path-datum [:b],
;;      :path-datum [:b],
;;      :path-predicate [:b 0],
;;      :predicate list?,
;;      :valid? true})

Checklist time.

  • Specification shape mimics data? Check.
  • Validating collections, ignoring scalars? Check.
  • Two paired predicates, two validations? Check.

There's a subtlety to pay attention to: the vector? and list? predicates are contained within a vector and list, respectively. Those two predicates apply to their immediate parent container. validate-collections needs those :a and :b keys to find that vector and that list. We only use a sham key when validating a map immediately above our heads. Let's demonstrate how a sham key works in this instance.

Let's re-use that specification and tack on a sham :howdy key with a map? predicate aimed at the root map.

{:a [vector?] :b (list list?) :howdy map?}

Now we validate with the new specification with three predicates: one predicate each for the root collection and the two nested collections.

(validate-collections {:a [99], :b (list 77)}
                      {:a [vector?], :b (list list?), :howdy map?})
;; => ({:datum [99],
;;      :ordinal-path-datum [:a],
;;      :path-datum [:a],
;;      :path-predicate [:a 0],
;;      :predicate vector?,
;;      :valid? true}
;;     {:datum (77),
;;      :ordinal-path-datum [:b],
;;      :path-datum [:b],
;;      :path-predicate [:b 0],
;;      :predicate list?,
;;      :valid? true}
;;     {:datum {:a [99], :b (77)},
;;      :ordinal-path-datum [],
;;      :path-datum [],
;;      :path-predicate [:howdy],
;;      :predicate map?,
;;      :valid? true})

We've got the vector and list validations as before, and then, at the end, we see that map? at the sham :howdy key was applied to the root. Because the parent collection is not sequential (i.e., a map), validate-collections did not have to skip over any intervening non-collections. There is no concept of order; elements are addressed by non-sequential keys. For example, predicate vector? is located at path [:a 0] within the specification. Right-trimming that path…

(drop-last [:a 0]) ;; => (:a)

…resolves to directly to the path of the collection nested at path [:a] in the data. It made no difference that predicate map? was floating around there at path [:howdy] in the parent level. Likewise, predicate list? was applied to the list nested at path [:b 0] in the data because its right-trimmed path…

(drop-last [:b 0]) ;; => (:b)

…doesn't involve :howdy at any point.

One more example to illustrate how collection validation ignores un-paired elements. Again, here's our data.

{:a [99] :b (list 77)}

And again, we'll copy-paste the data, then delete the scalars. That'll be our template for our collection specification.

{:a [  ] :b (list   )}

Now, we'll go even further and delete the :b key and its associated value, the nested list.

{:a [  ]             }

;; without :b, impossible to validate the list associated to :b

Insert old reliable vector?. That predicate is paired with its immediate parent vector, so we need to keep the :a key.

{:a [vector?]        }

Finally, we'll add in a wholly different key that doesn't appear in the data, :flamingo, with a coll? predicate nested in a vector associated to that new key.

{:a [vector?] :flamingo [coll?]}

Test yourself: How many validations will occur?

(validate-collections {:a [99], :b (list 77)}
                      {:a [vector?], :flamingo [coll?]})
;; => ({:datum [99],
;;      :ordinal-path-datum [:a],
;;      :path-datum [:a],
;;      :path-predicate [:a 0],
;;      :predicate vector?,
;;      :valid? true})

Answer: one.

In this example, there is only one predicate+collection pair. vector? applies to the vector at :a. We might have expected coll? to be applied to the root collection because :flamingo doesn't appear in the map, but notice that coll? is contained in a vector. It would only ever apply to the thing that contained it. Since the data's root doesn't contain a collection at key :flamingo, the predicate is unpaired, and thus ignored.

If we did want to apply coll? to the root, it needs to be contained directly in the root. We'll associate coll? to key :emu.

(validate-collections {:a [99], :b (list 77)}
                      {:a [vector?], :emu coll?})
;; => ({:datum [99],
;;      :ordinal-path-datum [:a],
;;      :path-datum [:a],
;;      :path-predicate [:a 0],
;;      :predicate vector?,
;;      :valid? true}
;;     {:datum {:a [99], :b (77)},
;;      :ordinal-path-datum [],
;;      :path-datum [],
;;      :path-predicate [:emu],
;;      :predicate coll?,
;;      :valid? true})

Now, coll?'s immediate container is the root. Since it is now properly paired with a collection, it participates in validation.

We've churned through a ton of examples to reinforce the underlying mechanics of collection validation. But don't get overwhelmed by the drudgery. The vast majority of the time, we will be well-served to remember just these ideas while validating collections.

  1. Shape the specification to mimic the data (Motto #2).
  2. Predicates apply to the collections that contain them.
  3. To validate a map, associate the predicate to a key that doesn't appear in the data.
  4. When collections are nested in a sequential collection, the predicates are applied to their immediate parent, in the order as if there were no intervening scalars in the ancestor.
  5. Collections nested in maps are not affected by order.

All the detailed mechanics we've discussed in this section have been to support those five ideas.

Two more additional notes.

  • When we worked through the collection validation algorithm by hand, we discussed it in terms of 'steps' and 'phases', etc., that have a strong imperative flavor. However, the implementation is purely functional. The 'steps' and 'phases' are merely one way to understand the consequences of the way Speculoos handles pairing predicates and their targets.
  • Our examples showed validating collections nested at most one level deep, e.g., a map nested in a vector. However, the algorithm is fully general. We can validate any element of any arbitrary depth, of any mixture of Clojure collection types. Just to show off:
  • (validate-collections [99 88 77 {:x (list 66 55 {:y [44 33 22 11 #{42}]})}]
                          [{:x (list {:y [#{set?}]})}])
    ;; => ({:datum #{42},
    ;;      :ordinal-path-datum [0 :x 0 :y 0],
    ;;      :path-datum [3 :x 2 :y 4],
    ;;      :path-predicate [0 :x 0 :y 0 set?],
    ;;      :predicate set?,
    ;;      :valid? true})

    From the outset, I intended Speculoos to be capable of validating any heterogeneous, arbitrarily nested data structure.

Why the collection validation algorithm is different from the scalar validation algorithm

The algorithm implemented by validate-collections is slightly different from validate-scalars. It has to do with the fact that a scalar in the data can occupy the exact same path as a predicate in the specification. A function, after all, is also a scalar. To be fully general (i.e., handle any pattern and depth of nesting), a collection in the data can not share a path with a predicate in the specification.

To begin, we'll intentionally take a wrong turn to show why the collection validation algorithm is a little bit different from the scalar validation algorithm. As before, we want to specify that our data vector is exactly n elements long. Recall these predicates.

(defn len-3? [c] (= 3 (count c)))
(defn len-2? [c] (= 2 (count c)))
(defn len-1? [c] (= 1 (count c)))

We're interested in validating the root collection, at path [] in the data, so at first, we'll naively try to put our len-3? predicate at path [] in the specification.

We could then invoke some imaginary collection validation function that treats bare, free-floating predicates as being located at path [].

;; this fn doesn't actually exist

(imaginary-validate-collection [42 "abc" 22/7]   len-3?) ;; => true

Okay, that scenario maybe kinda sorta could work. By policy, imaginary-validate-collection could consider a bare predicate as being located at path [] in the specification, and therefore would apply to the root collection at path [] in the data.

But consider this scenario: A two-element vector nested within a two-element vector. One example of that data looks like this.

[42 ["abc" 22/7]]

Let's take a look at the paths.

(all-paths [42 ["abc" 22/7]])
;; => [{:path [], :value [42 ["abc" 22/7]]}
;;     {:path [0], :value 42}
;;     {:path [1], :value ["abc" 22/7]}
;;     {:path [1 0], :value "abc"}
;;     {:path [1 1], :value 22/7}]

We're validating collections, so we're only interested in the root collection at path [] and the nested vector at path [1].

[{:path [], :value [42 ["abc" 22/7]]}
 {:path [1], :value ["abc" 22/7]}]

And now we run into an problem: How do we compose a specification with two predicates, one at [] and one at [1]? The predicate aimed at the root collection has already absorbed, by policy, the root path, so there's nowhere to 'put' the second predicate.

;; this fn doesn't actually exist

(imaginary-validate-collection [42 ["abc" 22/7]]   len-3?   len-2?) ;; => true

Because the len-3? predicate absorbs the [] path to root, and because predicates are not themselves collections and cannot 'contain' something else, the second predicate, len-2?, needs to also be free-floating at the tail of the argument list. Our imaginary-validate-collections would have to somehow figure out that predicate len-3? ought to be paired with the root collection, [42 ["abc" 22/7] and predicate len-2? ought to be paired with the nested vector ["abc" 22/7].

It gets even worse if we have another level of nesting. How about three vectors, each nested within another?

[42 [ "abc" [22/7]]]

The paths for that.

(all-paths [42 ["abc" [22/7]]])
;; => [{:path [], :value [42 ["abc" [22/7]]]}
;;     {:path [0], :value 42}
;;     {:path [1], :value ["abc" [22/7]]}
;;     {:path [1 0], :value "abc"}
;;     {:path [1 1], :value [22/7]}
;;     {:path [1 1 0], :value 22/7}]

Regarding only the data's collections, we see three elements to validate, at paths [], [1], and [1 1].

[{:path [], :value [42 ["abc" [22/7]]]}
 {:path [1], :value ["abc" [22/7]]}
 {:path [1 1], :value [22/7]}]

Invoking the imaginary collection validator would have to look something like this.

;; this fn doesn't actually exist

(imaginary-validate-collection [42 ["abc" [22/7]]]   len-3?   len-2?   len-1?) ;; => true

Three free-floating predicates, with no indication of where they ought to be applied. The imaginary validator would truly need to read our minds to know which predicate pairs with which nested collection, if any.

Someone might propose that we include some paths immediately following each predicate to inform the imaginary validator where to apply those predicates.

;; this fn doesn't actually exist

(imaginary-validate-collection-2 [42 ["abc" [22/7]]]   len-3? [0]   len-2? [1 0]   len-1? [1 1 0]) ;; => true

That certainly works, but at that point, we've manually serialized a nested data structure. I wouldn't want to have to write out the explicit paths of more than a few predicates. Furthermore, writing separate, explicit paths could be error-prone, and not terribly re-usable, nor compact. One of Speculoos' goals is to make composing specifications intuitive. I find writing specifications with data structure literals expressive and straightforward to manipulate.

Here's that same specification, written as a literal data structure.

[len-3? [len-2? [len-1?]]

Visually, that specification looks a lot like the data. If we know the rule about predicates applying to their immediate parent containers during collection validation, that specification carries meaning. And, we can slice and dice it any way we'd like with assoc-in, or any other standard tool.

Here is the collection validation, Speculoos-style, with data in the upper row, specification literal in the lower row.

(validate-collections [42 ["abc" [22/7]]]
                      [len-3? [len-2? [len-1?]]])
;; => ({:datum [42 ["abc" [22/7]]],
;;      :ordinal-path-datum [],
;;      :path-datum [],
;;      :path-predicate [0],
;;      :predicate len-3?,
;;      :valid? false}
;;     {:datum ["abc" [22/7]],
;;      :ordinal-path-datum [0],
;;      :path-datum [1],
;;      :path-predicate [1 0],
;;      :predicate len-2?,
;;      :valid? true}
;;     {:datum [22/7],
;;      :ordinal-path-datum [0 0],
;;      :path-datum [1 1],
;;      :path-predicate [1 1 0],
;;      :predicate len-1?,
;;      :valid? true})

Speculoos' Motto #2 is Shape the specification to mimic the data. The arrangement of our collection predicates inside a structure literal will instruct validate-collections where to apply the predicates. The advantage Speculoos offers is the fact that literals are easy for humans to inspect, understand, and manipulate.

Validation Summaries, Combo Validations, and Thorough Validations

Up until now, we've been using validate-scalars and validate-collections, because they're verbose. For teaching and learning purposes (and for diagnosing problems), it's useful to see all the information considered by the validators. However, in many situations, once we've got our specification shape nailed down, we'll want a cleaner yes or no answer on whether the data satisfied the specification. We could certainly pull out the non-truthy, invalid results ourselves…

(filter #(not (:valid? %))
  (validate-scalars [42 "abc" 22/7]
                    [int? symbol? ratio?]))
;; => ({:datum "abc",
;;      :path [1],
;;      :predicate symbol?,
;;      :valid? false})

…and then check for invalids ourselves…

(empty? *1) ;; => false

…but Speculoos provides a function that does exactly that, both for scalars…

(require '[speculoos.core :refer [valid-scalars? valid-collections?]])

(valid-scalars? [42 "abc" 22/7]   [int? symbol? ratio?]) ;; => false

…and for collections.

(valid-collections? [42 ["abc"]]
                    [vector? [vector?]])
;; => true

Whereas the validate-… functions return a detailed validation report of every predicate+datum pair they see, the valid-…? variants provide a plain true/false.

Beware: Validation only considers paired predicates+datums (Motto #3). If our datum doesn't have a paired predicate, then it won't be validated. Observe.

(valid-scalars? {:a 42}
                {:b string?}) ;; => true

(validate-scalars {:a 42}   {:b string?}) ;; => []

42 does not share a path with string?, the lone predicate in the specification. Since there are zero invalid results, valid-scalars? returns true.

» Within the Speculoos library, valid? means zero invalids. «

If you feel uneasy about this definition of 'valid' — that, somehow, you wouldn't be able to accomplish some particular validation task — rest easy. Speculoos provides us with facilities for ensuring that every datum is validated.

Thorough validation

Motto #3 reminds us that data elements not paired with a predicate are ignored. For some tasks, we may want to ensure that all elements in the data are subjected to at least one predicate. Plain valid? only reports if all datum+predicate pairs are true.

(valid-scalars? [42 "abc" 22/7]
                [int?]) ;; => true

In this example, only 42 and int? form a pair that is validated. "abc" and 22/7 are not paired with predicates, and therefore ignored. valid-scalars returns true regardless of the ignored scalars.

Speculoos' thorough function variants require that all data elements be specified, otherwise, they return false. Thoroughly validating that same data with that same specification shows the difference.

(require '[speculoos.utility :refer [thoroughly-valid-scalars?]])

(thoroughly-valid-scalars? [42 "abc" 22/7]   [int?]) ;; => false

Whereas valid-scalars? ignored the un-paired "abc" and 22/7, thoroughly-valid-scalars? notices that neither have a predicate. Even though 42 satisfied int?, the un-paired scalars mean that this validation is not thorough, and thus thoroughly-valid-scalars? returns false.

The utility namespace provides a thorough variant for collections, as well as a variant for combo validations. thoroughly-valid-collections? works analogously to what we've just seen.

Let's do a quick preview of a combo validation. A combo validation is a convenient way to validate the scalars, and then separately validate the collections, of some data with a single function invocation. First, the 'plain', non-thorough version. The data occupies the top row (i.e., first argument), the scalar specification occupies the middle row, and the collection specification occupies the lower row.

(valid? [42 "abc" 22/7]
        [int?]
        [vector?]) ;; => true

We validated the single vector, and only one out of the three scalars. valid? only considers paired elements+predicates, so it only validated 42, a scalar, and the root vector, a collection. valid? ignored scalars "abc" and 22/7.

The thorough variant, thoroughly-valid?, however, does not ignore un-paired data elements. The function signatures is identical: data on the top row, scalar specification on the middle row, and the collection specification on the lower row.

(require '[speculoos.utility :refer [thoroughly-valid?]])

(thoroughly-valid? [42 "abc" 22/7]   [int?]   [vector?]) ;; => false

Even though both predicates, int? and vector?, were satisfied, thoroughly-valid? requires that all data elements be validated. Since 42 and 22/7 are un-paired, the entire validation returns false.

Note: Thoroughly validating does not ensure any measure of correctness nor rigor. 'Thorough' merely indicates that each element was exposed to some kind of predicate. That predicate could actually be trivially permissive. In the next example, any? returns true for all values.

(thoroughly-valid? [42 "abc" 22/7]
                   [any? any? any?]
                   [any?])
;; => true

The only thing thoroughly-valid? tells us in this example is that the one vector and all three scalars were paired with a predicate, and that all four data elements satisfied a guaranteed-to-be-satisfied predicate.

Validation is only as good as the predicate. It's our responsibility to write a proper predicate.

Combo validation

Validating scalars separately from validating collections is a core principle (Motto #1) embodied by the Speculoos library. Separating the two into distinct processes carries solid advantages because the specifications are more straightforward, the mental model is clearer, the implementation code is simpler, and it makes validation à la carte. Much of the time, we can probably get away with just a scalar specification.

All that said, it is not possible to specify and validate every aspect of our data with only scalar validation or only collection validation. When we really need to be strict and validate both scalars and collections, we could manually combine the two validations like this.

(and (valid-scalars? [42] [int?])
     (valid-collections? [42] [vector?]))
;; => true

Speculoos provides a pre-made utility that does exactly that. We supply some data, then a scalar specification, then a collection specification.

(require '[speculoos.core :refer [valid? validate]])

(valid? [42]   [int?]   [vector?]) ;; => true

Let me clarify what valid? is doing here, because it is not violating the first Motto about separately validating scalars and collections. First, valid? performs a scalar validation on the data, and puts that result on the shelf. Then, in a completely distinct operation, it performs a collection validation. valid? then pulls the scalar validation results off the shelf and combines it with the collection validation results, and returns a singular true/false. (Look back at the first example of this sub-section to see the separation.)

As an affirmation to how much I believe this, I reserved the shortest, most mnemonic function name, valid? to encourage Speculoos users to validate both scalars and collections, but separately.

Speculoos also provides a variant that returns detailed validation results after performing distinct scalar validation and collection validation.

(validate [42 "abc" 22/7]
          [int? symbol? ratio?]
          [vector?])
;; => ({:datum 42,
;;      :path [0],
;;      :predicate int?,
;;      :valid? true}
;;     {:datum "abc",
;;      :path [1],
;;      :predicate symbol?,
;;      :valid? false}
;;     {:datum 22/7,
;;      :path [2],
;;      :predicate ratio?,
;;      :valid? true}
;;     {:datum [42 "abc" 22/7],
;;      :ordinal-path-datum [],
;;      :path-datum [],
;;      :path-predicate [0],
;;      :predicate vector?,
;;      :valid? true})

validate gives us the exact results as if we had run validate-scalars and then immediately thereafter validate-collections. validate merely provides us the convenience of quickly running both in succession without having to re-type the data. With one invocation, we can validate all aspects of our data, both scalars and collections, and we never violated Motto #1.

Function Naming Conventions

Here are the general patterns regarding the function names.

  • validate-… functions return a detailed report for every datum+predicate pair.
  • valid-…? functions return true if the predicate+datum pairs produce zero falsey results, false otherwise.
  • …-scalars functions consider only non-collection datums.
  • …-collections functions consider only non-scalar datums.
  • thoroughly-… functions return true only if every element (scalar or collection, as the case may be) is paired with a predicate, and every element satisfies its predicate.
'Plain' functions (i.e., validate, valid?, and thoroughly-valid?) perform a scalar validation, followed by performing a distinct collection validation, and returns a single comprehensive response that merges the results of both.

Here's how those terms are put together, and what they do.

functionchecks…returns…note
validate-scalarsscalars onlydetailed validation report
valid-scalars?scalars onlytrue/false
thoroughly-valid-scalars?scalars onlytrue/falseonly true if all scalars paired with a predicate
validate-collectionscollections onlydetailed validation report
valid-collections?collections onlytrue/false
thoroughly-valid-collections?collections onlytrue/falseonly true if all collections paired with a predicate
validatescalars, then collections, separatelydetailed validation report
valid?scalars, then collections, separatelytrue/false
thoroughly-valid?scalars, then collections separatelytrue/falseonly true if all datums paired with a predicate

Specifying and Validating Functions

Being able to validate Clojure data enables us to check the usage and behavior of functions.

  1. Validating arguments Speculoos can validate any property of the arguments passed to a function when it is invoked. We can ask questions like Is the argument passed to the function a number?, a scalar validation, and Are there an even number of arguments?, a collection validation.
  2. Validating return values Speculoos can validate any property of the value returned by a function. We can ask questions like Does the function return a four-character string?, a scalar validation, and Does the function return a map containing keys :x and :y, a collection validation.
  3. Validating function correctness Speculoos can validate the correctness of a function in two ways.
    • Speculoos can validate the relationships between the arguments and the function's return value. We can ask questions like Is each of the three integers in the return value larger than the three integers in the arguments?, a scalar validation, and Is the return sequence the same length as the argument sequence, and are all the elements in reverse order?, a collection validation.
    • Speculoos can exercise a function. This allows us to check If we give this function one thousand randomly-generated valid inputs, does the function always produce a valid return value? Exercising functions with randomly-generated samples is described in the
    • next section.

None of those six checks are strictly required. Speculoos will happily validate using only the specifications we provide.

1. Validating Function Arguments

When we invoke a function with a series of arguments, that series of values forms a sequence, which Speculoos can validate like any other heterogeneous, arbitrarily-nested data structure. Speculoos offers a trio of function-validating functions with differing levels of explicitness. We'll be primarily using validate-fn-with because it is the most explicit of the trio, and we can most easily observe what's going on.

Let's pretend we want to validate the arguments to a function sum-three that expects three integers and returns their sum.

(require '[speculoos.function-specs :refer [validate-fn-with]])

(defn sum-three [x y z] (+ x y z))

(sum-three 1 20 300) ;; => 321

The argument list is a sequence of values, in this example, a sequential thing of three integers. We can imagine a scalar specification for just such a sequence.

[int? int? int?]

When using validate-fn-with, we supply the function name, a map containing zero or more specifications, and some trailing &-args as if they had been supplied directly to the function. Here's the function signature, formatted the way we'll be seeing in the upcoming discussion. The function name will appear in the top row (i.e., first argument), a specification organizing map in the second row, followed by zero or more arguments to be supplied to the function being validated.

(validate-fn-with function-name
specification-organizing-map
argument-1
argument-2

argument-n)

Speculoos can validate five aspects of a function using up to five specifications, each specification associated in that map to a particular key. We'll cover each of those five aspects in turn. To start, we want to specify the argument scalars.

Instead of individually passing each of those five specifications to validate-fn-with and putting nil placeholders where we don't wish to supply a specification, we organize the specifications. To do so, we associate the arguments' scalar specification to the qualified key :speculoos/arg-scalar-spec.

{:speculoos/arg-scalar-spec [int? int? int?]}

Then, we validate the arguments to sum-three like this.

(validate-fn-with sum-three
                  {:speculoos/arg-scalar-spec [int? int? int?]}
                  1
                  20
                  300)
;; => 321

The arguments conformed to the scalar specification, so validate-fn-with returns the value produced by sum-three. Let's intentionally invoke sum-three with one invalid argument by swapping integer 1 with a floating-point 1.0.

(validate-fn-with sum-three
                  {:speculoos/arg-scalar-spec [int? int? int?]}
                  1.0
                  20
                  300)
;; => ({:datum 1.0,
;;      :fn-spec-type :speculoos/argument,
;;      :path [0],
;;      :predicate int?,
;;      :valid? false})

Hey, that kinda looks familiar. It looks a lot like something validate-scalars would emit if we filtered to keep only the invalids. We see that 1.0 at path [0] failed to satisfy its int? scalar predicate. We can also see that the function specification type is :speculoos/argument. Since Speculoos can validate scalars and collections of both arguments and return values, that key-val is a little signpost to help us pinpoint exactly what and where. Let's invoke sum-three with a second invalid argument, a ratio 22/7 instead of integer 300.

(validate-fn-with sum-three
                  {:speculoos/arg-scalar-spec [int? int? int?]}
                  1.0
                  20
                  22/7)
;; => ({:datum 1.0,
;;      :fn-spec-type :speculoos/argument,
;;      :path [0],
;;      :predicate int?,
;;      :valid? false}
;;     {:datum 22/7,
;;      :fn-spec-type :speculoos/argument,
;;      :path [2],
;;      :predicate int?,
;;      :valid? false})

In addition to the invalid 1.0 at path [0], we see that 22/7 at path [2] also fails to satisfy its int? scalar predicate. The scalar predicate's path in the scalar specification is the same as the path of the 22/7 in the [1.0 20 22/7] sequence of arguments. Roughly, validate-fn-with is doing something like this…

(speculoos.core/only-invalid
  (validate-scalars [1.0 20 22/7]
                    [int? int? int?]))
;; => ({:datum 1.0,
;;      :path [0],
;;      :predicate int?,
;;      :valid? false}
;;     {:datum 22/7,
;;      :path [2],
;;      :predicate int?,
;;      :valid? false})

…validating scalars with validate-scalars and keeping only the invalids.

Okay, we see that term scalar buzzing around, so there must be something else about validating collections. Yup. We can also validate collection properties of the argument sequence. Let's specify that the argument sequence must contain three elements, using a custom collection predicate, count-3?.

(defn count-3? [v] (= 3 (count v)))

Let's simulate the collection validation first. Remember, collection predicates are applied to their parent containers, so count-3? must appear within a collection so that it'll be paired with the data's containing collection.

(validate-collections [1 20 30]
                      [count-3?])
;; => ({:datum [1 20 30],
;;      :ordinal-path-datum [],
;;      :path-datum [],
;;      :path-predicate [0],
;;      :predicate count-3?,
;;      :valid? true})

That result fits with our discussion about validating collections.

Next, we'll associate that collection specification into our function specification map at :speculoos/arg-collection-spec and invoke validate-fn-with with three valid arguments.

(validate-fn-with sum-three
                  {:speculoos/arg-collection-spec [count-3?]}
                  1
                  20
                  300)
;; => 321

The argument sequence satisfies our collection specification, so sum-three returns the expected value. Now let's repeat, but with an additional argument, 4000, that causes the argument sequence to violate its collection predicate.

(validate-fn-with sum-three
                  {:speculoos/arg-collection-spec [count-3?]}
                  1 20
                  300 4000)
;; => ({:datum [1 20 300 4000],
;;      :fn-spec-type :speculoos/argument,
;;      :ordinal-path-datum [],
;;      :path-datum [],
;;      :path-predicate [0],
;;      :predicate count-3?,
;;      :valid? false})

This four-element argument sequence, [1 20 300 4000], failed to satisfy our count-3? collection predicate, so validate-fn-with emitted a validation report.

Note #1: Invoking sum-three with four arguments would normally trigger an arity exception. validate-fn-with catches the exception and validates as much as it can.

Note #2: Don't specify and validate the type of the arguments container, i.e., vector?. That's an implementation detail and not guaranteed.

Let's get fancy and combine an argument scalar specification and an argument collection specification. Outside of the context of checking a function, that combo validation would look like this: data is the first argument to validate, then the scalar specification on the next row, then the collection specification on the lower row.

(speculoos.core/only-invalid
  (validate [1.0 20 22/7 4000]
            [int? int? int?]
            [count-3?]))
;; => ({:datum 1.0,
;;      :path [0],
;;      :predicate int?,
;;      :valid? false}
;;     {:datum 22/7,
;;      :path [2],
;;      :predicate int?,
;;      :valid? false}
;;     {:datum [1.0 20 22/7 4000],
;;      :ordinal-path-datum [],
;;      :path-datum [],
;;      :path-predicate [0],
;;      :predicate count-3?,
;;      :valid? false})

Let's remember: scalars and collections are always validated separately. validate is merely a convenience function that does both a scalar validation, then a collection validation, in discrete processes, with a single function invocation. Each of the first three scalars that paired with a scalar predicate were validated as scalars. The first and third scalars, 1.0 and 22/7, failed to satisfy their respective predicates. The fourth argument, 4000, was not paired with a scalar predicate and was therefore ignored (Motto #3). Then, the argument sequence as a whole was validated against the collection predicate count-3?.

validate-fn-with performs substantially that combo validation. We'll associate the argument scalar specification with :speculoos/arg-scalar-spec and the argument collection specfication with :speculoos/arg-collection-spec and pass the invalid, four-element argument sequence.

(validate-fn-with sum-three
                  {:speculoos/arg-scalar-spec [int? int? int?],
                   :speculoos/arg-collection-spec [count-3?]}
                  1.0 20
                  22/7 4000)
;; => ({:datum 1.0,
;;      :fn-spec-type :speculoos/argument,
;;      :path [0],
;;      :predicate int?,
;;      :valid? false}
;;     {:datum 22/7,
;;      :fn-spec-type :speculoos/argument,
;;      :path [2],
;;      :predicate int?,
;;      :valid? false}
;;     {:datum [1.0 20 22/7 4000],
;;      :fn-spec-type :speculoos/argument,
;;      :ordinal-path-datum [],
;;      :path-datum [],
;;      :path-predicate [0],
;;      :predicate count-3?,
;;      :valid? false})

Just as in the validate simulation, we see three items fail to satisfy their predicates. Scalars 1.0 and 22/7 are not integers, and the argument sequence as a whole, [1.0 20 22/7 4000], does not contain exactly three elements, as required by its collection predicate, count-3?.

2. Validating Function Return Values

Speculoos can also validate values returned by a function. Reusing our sum-three function, and going back to valid inputs, we can associate a return scalar specification into validate-fn-with's specification map to key :speculoos/ret-scalar-spec. Let's stipulate that the function returns an integer. Here's how we pass that specification to validate-fn-with.

{:speculoos/ret-scalar-spec int?}

And now, the function return validation.

(validate-fn-with sum-three
                  {:speculoos/ret-scalar-spec int?}
                  1
                  20
                  300)
;; => 321

The return value 321 satisfies int?, so validate-fn-with returns the computed sum.

What happens when the return value is invalid? Instead of messing up sum-three's definition, we'll merely alter the scalar predicate. Instead of an integer, we'll stipulate that sum-three returns a string with scalar predicate string?.

(validate-fn-with sum-three
                  {:speculoos/ret-scalar-spec string?}
                  1
                  20
                  300)
;; => ({:datum 321,
;;      :fn-spec-type :speculoos/return,
;;      :path nil,
;;      :predicate string?,
;;      :valid? false})

Very nice. sum-three computed, quite correctly, the sum of the three arguments. But we gave it a bogus return scalar specification that claimed it ought to be a string, which integer 321 fails to satisfy.

Did you happen to notice the path? We haven't yet encountered a case where a path is nil. In this situation, the function returns a 'bare' scalar, not contained in a collection. Speculoos can validate a bare scalar when that bare scalar is a function's return value.

Let's see how to validate a function when the return value is a collection of scalars. We'll write a new function, enhanced-sum-three, that returns four scalars: the three arguments and their sum, all contained in a vector.

(defn enhanced-sum-three [x y z] [x y z (+ x y z)])

(enhanced-sum-three 1 20 300) ;; => [1 20 300 321]

Our enhanced function now returns a vector of four elements. Let's remind ourselves how we'd manually validate that return value. If we decide we want enhanced-sum-three to return four integers, the scalar specification would look like this.

[int? int? int? int?]

The scalar specification is shaped like our data (Motto #2).

And the manual validation would look like this, with the data on the upper row, the scalar specification on the lower row.

(validate-scalars [1 20 300 321]
                  [int? int? int? int?])
;; => [{:datum 1,
;;      :path [0],
;;      :predicate int?,
;;      :valid? true}
;;     {:datum 20,
;;      :path [1],
;;      :predicate int?,
;;      :valid? true}
;;     {:datum 300,
;;      :path [2],
;;      :predicate int?,
;;      :valid? true}
;;     {:datum 321,
;;      :path [3],
;;      :predicate int?,
;;      :valid? true}]

Four paired scalars and scalar predicates yield four validation results. Let's see what happens when we validate the function return scalars.

(validate-fn-with enhanced-sum-three
                  {:speculoos/ret-scalar-spec [int? int? int? int?]}
                  1
                  20
                  300)
;; => [1 20 300 321]

Since we fed validate-fn-with a specification that happens to agree with those arguments, enhanced-sum-three returns its computed value, [1 20 300 321].

Let's stir things up. We'll change the return scalar specification to something we know will fail: The first scalar a character, the final scalar a boolean.

(validate-fn-with enhanced-sum-three
                  {:speculoos/ret-scalar-spec [char? int? int? boolean?]}
                  1
                  20
                  300)
;; => ({:datum 1,
;;      :fn-spec-type :speculoos/return,
;;      :path [0],
;;      :predicate char?,
;;      :valid? false}
;;     {:datum 321,
;;      :fn-spec-type :speculoos/return,
;;      :path [3],
;;      :predicate boolean?,
;;      :valid? false})

enhanced-sum-three's function body remained the same, and we fed it the same integers as before, but we fiddled with the return scalar specification so that the returned vector contained two scalars that failed to satisfy their respective predicates. 1 at path [0] does not satisfy its wonky scalar predicate char? at the same path. And 321 at path [3] does not satisfy fraudulent scalar predicate boolean? that shares its path.

Let's set aside validating scalars for a moment and validate a facet of enhanced-sum-three's return collection. First, we'll do a manual demonstration with validate-collections. Remember: Collection predicates apply to their immediate parent container. We wrote enhanced-sum-three to return a vector, but to make the validation produce something interesting to look at, we'll pretend we're expecting a list.

(validate-collections [1 20 300 321]
                      [list?])
;; => ({:datum [1 20 300 321],
;;      :ordinal-path-datum [],
;;      :path-datum [],
;;      :path-predicate [0],
;;      :predicate list?,
;;      :valid? false})

That collection validation aligns with our understanding. [1 20 300 321] is not a list. The list? collection predicate at path [0] in the specification was paired with the thing found at path (drop-last [0]) in the data, which in this example is the root collection. We designed enhanced-sum-three to yield a vector, which failed to satisfy predicate list?.

Let's toss that collection specification at validate-with-fn and have it apply to enhanced-sum-three's return value, which won't satisfy. We pass the return collection specification by associating it to the key :speculoos/ret-collection-spec.

(validate-fn-with enhanced-sum-three
                  {:speculoos/ret-collection-spec [list?]}
                  1
                  20
                  300)
;; => ({:datum [1 20 300 321],
;;      :fn-spec-type :speculoos/return,
;;      :ordinal-path-datum [],
;;      :path-datum [],
;;      :path-predicate [0],
;;      :predicate list?,
;;      :valid? false})

Similarly to the manual collection validation we previously performed with validate-collections, we see that enhanced-sum-three's return vector [1 20 300 321] fails to satisfy its list? collection predicate.

A scalar validation followed by an independent collection validation allows us to check every possible aspect that we could want. Now that we've seen how to individually validate enhance-sum-three's return scalars and return collections, we know how to do both with one invocation.

Remember Motto #1: Validate scalars separately from validating collections. Speculoos will only ever do one or the other, but validate is a convenience function that performs a scalar validation immediately followed by a collection validation. We'll re-use the scalar specification and collection specification from the previous examples.

(speculoos.core/only-invalid
  (validate [1 20 300 321]
            [char? int? int? boolean?]
            [list?]))
;; => ({:datum 1,
;;      :path [0],
;;      :predicate char?,
;;      :valid? false}
;;     {:datum 321,
;;      :path [3],
;;      :predicate boolean?,
;;      :valid? false}
;;     {:datum [1 20 300 321],
;;      :ordinal-path-datum [],
;;      :path-datum [],
;;      :path-predicate [0],
;;      :predicate list?,
;;      :valid? false})

only-invalid discards the validations where the predicates are satisfied, leaving only the invalids. Two scalars failed to satisfy their scalar predicates. Integer 1 at path [0] in the data fails to satisfy scalar predicate char? at path [0] in the scalar specification. Integer 321 fails to satisfy scalar predicate boolean? at path [3] in the scalar specification. Finally, our root vector [1 20 300 321] located at path [] fails to satisfy the collection predicate list? at path [0].

Now that we've seen the combo validation done manually, let's validate enhanced-sum-three's return in the same way. Here we see the importance of organizing the specifications in a container instead of passing them as individual arguments: it keeps our invocation neater.

(validate-fn-with enhanced-sum-three
                  {:speculoos/ret-scalar-spec [char? int? int? boolean?],
                   :speculoos/ret-collection-spec [list?]}
                  1
                  20
                  300)
;; => ({:datum 1,
;;      :fn-spec-type :speculoos/return,
;;      :path [0],
;;      :predicate char?,
;;      :valid? false}
;;     {:datum 321,
;;      :fn-spec-type :speculoos/return,
;;      :path [3],
;;      :predicate boolean?,
;;      :valid? false}
;;     {:datum [1 20 300 321],
;;      :fn-spec-type :speculoos/return,
;;      :ordinal-path-datum [],
;;      :path-datum [],
;;      :path-predicate [0],
;;      :predicate list?,
;;      :valid? false})

validate-fn-with's validation is substantially the same as the one validate produced in the previous example, except now, the data comes from invoking enhanced-sum-three. Two scalar invalids and one collection invalid. Integer 1 fails to satisfy scalar predicate char?, integer 321 fails to satisfy scalar predicate boolean?, and the entire return vector [1 20 300 321] fails to satisfy collection predicate list?.

Okay. I think we're ready to put together all four different function validations we've so far seen. We've seen…

  • a function argument scalar validation,
  • a function argument collection validation,
  • a function return scalar validation, and
  • a function return collection validation.

And we've seen how to combine both function argument validations, and how to combine both function return validations. Now we'll combine all four validations into one validate-fn-with invocation.

Let's review our ingredients. Here's our enhanced-sum-three function.

(defn enhanced-sum-three [x y z] [x y z (+ x y z)])

enhanced-sum-three accepts three number arguments and returns a vector of those three numbers with their sum appended to the end of the vector. Technically, Clojure would accept any numeric thingy for x, y, and z, but for illustration purposes, we'll make our scalar predicates something non-numeric so we can see something interesting in the validation reports.

With that in mind, we pretend that we want to validate the function's argument sequence as a string, followed by an integer, followed by a symbol. The function scalar specification will be…

[string? int? symbol?]

To allow enhanced-sum-three to calculate a result, we'll supply three numeric values, two of which will not satisfy that argument scalar specification.

So that it produces something interesting, we'll make our function argument collection specification also complain. First, we'll write a collection predicate.

(defn length-2? [v] (= 2 (count v)))

We know for sure that the argument sequence will contain three values, so predicate length-2? will produce something interesting to see.

During collection validation, predicates apply to the collection in the data that corresponds to the collection that contains the predicate. We want our predicate, length-2? to apply to the argument sequence, so we'll insert it into a vector. Our argument collection specification will look like this.

[length-2?]

Jumping to enhanced-sum-three's output side, we expect a vector of four numbers. Again, we'll craft our function return scalar specification to contain two predicates that we know won't be satisfied because those scalar predicates are looking for something non-numeric.

[char? int? int? boolean?]

We know enhanced-sum-three will return a vector containing four integers, but the char? and boolean? will give us something to look at.

Finally, since we defined enhanced-sum-three to return a vector, we'll make the function return collection specification look for a list.

[list?]

Altogether, those four specification are organized like this.

{:speculoos/arg-scalar-spec     [string? int? symbol?]
 :speculoos/arg-collection-spec [#(= 2 (count %))]
 :speculoos/ret-scalar-spec     [char? int? int? boolean?]
 :speculoos/ret-collection-spec [list?]}

It's time to see what we've assembled.

(validate-fn-with enhanced-sum-three
                  {:speculoos/arg-scalar-spec [string? int? symbol?],
                   :speculoos/arg-collection-spec [length-2?],
                   :speculoos/ret-scalar-spec [char? int? int? boolean?],
                   :speculoos/ret-collection-spec [list?]}
                  1
                  20
                  300)
;; => ({:datum 1,
;;      :fn-spec-type :speculoos/argument,
;;      :path [0],
;;      :predicate string?,
;;      :valid? false}
;;     {:datum 300,
;;      :fn-spec-type :speculoos/argument,
;;      :path [2],
;;      :predicate symbol?,
;;      :valid? false}
;;     {:datum [1 20 300],
;;      :fn-spec-type :speculoos/argument,
;;      :ordinal-path-datum [],
;;      :path-datum [],
;;      :path-predicate [0],
;;      :predicate length-2?,
;;      :valid? false}
;;     {:datum 1,
;;      :fn-spec-type :speculoos/return,
;;      :path [0],
;;      :predicate char?,
;;      :valid? false}
;;     {:datum 321,
;;      :fn-spec-type :speculoos/return,
;;      :path [3],
;;      :predicate boolean?,
;;      :valid? false}
;;     {:datum [1 20 300 321],
;;      :fn-spec-type :speculoos/return,
;;      :ordinal-path-datum [],
;;      :path-datum [],
;;      :path-predicate [0],
;;      :predicate list?,
;;      :valid? false})

We've certainly made a mess of things. But it'll be understandable if we examine the invalidation report piece by piece. The first thing to know is that we have already seen each of those validations before in the previous examples, so we could always scroll back to those examples above and see the validations in isolation.

We see six non-satisfied predicates:

  • Scalar 1 in the arguments sequence fails to satisfy scalar predicate string? in the argument scalar specification.
  • Scalar 300 in the arguments sequence fails to satisfy scalar predicate symbol? in the argument scalar specification.
  • The argument sequence [1 20 300] fails to satisfy collection predicate length-2? in the argument collection specification.
  • Scalar 1 in the return vector fails to satisfy scalar predicate char? in the return scalar specification.
  • Scalar 321 in the return vector fails to satisfy scalar predicate boolean? in the return scalar specification.
  • The return vector [1 20 300 321] fails to satisfy collection predicate list? in the return collection specification.

Also note that the validation entries have a :fn-spec-type entry associated to either :speculoos/return or :speculoos/argument, which tells us where a particular invalid was located. There may be a situation where indistinguishable invalid datums appear in both the arguments and returns. In this case, integer 1 was an invalid datum at path [0] for both the argument sequence and the return vector. Keyword :fn-spec-type helps resolve the ambiguity.

3. Validating Function Correctness

So far, we've seen how to validate function argument sequences and function return values, both their scalars, and their collections. Validating function argument sequences allows us to check if the function was invoked properly. Validating function return values gives a limited ability to check the internal operation of the function.

If we want another level of thoroughness for checking correctness, we can specify and validate the relationships between the functions arguments and return values. Perhaps we'd like to be able to express The return value is a collection, with all the same elements as the input sequence. Or The return value is a concatenation of the even indexed elements of the input sequence. Speculoos' term for this action is validating function argument and return value relationship.

Let's pretend I wrote a reversing function, which accepts a sequential collection of elements and returns those elements in reversed order. If we give it…

[11 22 33 44 55]

…my reversing function ought to return…

[55 44 33 22 11]

Here are some critical features of that process that relate the reversing function's arguments to its return value.

  • The return collection is the same length as the input collection.
  • The return collection contains all the same elements as the input collection.
  • The elements of the return collection appear in reverse order from their positions in the input collection.

Oops. I must've written it before I had my morning coffee.

(defn broken-reverse [v] (conj v 9999))

(broken-reverse [11 22 33 44 55]) ;; => [11 22 33 44 55 9999]

Pitiful. We can see by eye that broken-reverse fulfilled none of the three relationships. The return collection is not the same length, contains additional elements, and is not reversed. Let's codify that pitifulness.

First, we'll write three relationship functions. Relationship functions are a lot like predicates. They return a truthy or falsey value, but instead consume two things instead of one. The function's argument sequence is passed as the first thing and the function's return value is passed as the second thing.

The first predicate tests Do two collections contain the same number of elements?

(defn same-length? [v1 v2] (= (count v1) (count v2)))

(same-length? [11 22 33 44 55]   [11 22 33 44 55]) ;; => true

(same-length? [11 22]   [11 22 33 44 55]) ;; => false

When supplied with two collections whose counts are the same, predicate same-length? returns true.

The second predicate tests Do two collections contain the same elements?

(defn same-elements? [v1 v2] (= (sort v1) (sort v2)))

(same-elements? [11 22 33 44 55]   [55 44 33 22 11]) ;; => true

(same-elements? [11 22 33 44 55]   [55 44 33 22 9999]) ;; => false

When supplied with two collections which contain the same elements, predicate same-elements? returns true.

The third predicate tests Do the elements of one collection appear in reversed order when compared to another collection?

(defn reversed? [v1 v2] (= v1 (reverse v2)))

(reversed? [11 22 33 44 55]   [55 44 33 22 11]) ;; => true

(reversed? [11 22 33 44 55]   [11 22 33 44 55]) ;; => false

When supplied with two collections, predicate reversed? returns true if the elements of the first collection appear in the reverse order relative to the elements of the second collection.

same-length?, same-element?, reversed? all consume two sequential things and test a relationship between the two. If their relationship is satisfied, they signal true, if not, then they signal false. They are all three gonna have something unkind to say about broken-reverse.

Now that we've established a few relationships, we need to establish where to apply those relationship tests. Checking broken-reverse's argument/return relationships with same-length?, same-elements?, and reversed? will be fairly straightforward: For each predicate, we'll pass the first argument (itself a collection), and the return value (also a collection). But some day, we might want to check a more sophisticated relationship that needs to extract some slice of the argument and/or slice of the return value. Therefore, we must declare a path to the slices we want to check. Of the return value, we'd like to check the root collection, so the return value's path is merely [].

When we consider how to extract the arguments, there's one tricky detail we must accommodate. The [11 22 33 44 55] vector we're going to pass to broken-reverse is itself contained in the argument sequence. Take a look.

(defn arg-passthrough [& args] args)

(arg-passthrough [11 22 33 44 55]) ;; => ([11 22 33 44 55])

To extract [11 22 33 44 55], the path will need to be [0].

(nth (arg-passthrough [11 22 33 44 55]) 0) ;; => [11 22 33 44 55]

When invoked with paths [0] and [], respectively for the arguments and returns, validate-argument-return-relationship does something like this.

(same-length? (get-in [[11 22 33 44 55]] [0])
              (get-in [11 22 33 44 55 9999] []))
;; => false

So here are the components to a single argument/return relationship validation.

  • A path to the interesting slice of the arguments. Example: [0]
  • A path to the interesting slice of the return value. Example: []
  • A relationship function. Example: same-length?

We stuff all three of those items into a map, which will be used for a single relationship validation.

{:path-argument [0]
 :path-return []
 :relationship-fn same-length?}

Within that map, both :path-… entries govern what slices of the argument and return are given to the relationship function. In this example, we want to extract the first item, at path [0], of the argument sequence and the entire return value, at path [].

We've written three argument/function relationships to test broken-reverse, so we'll need to somehow feed them to validate-fn-with. We do that by associating them into the organizing map with keyword :speculoos/argument-return-relationships. Notice the plural s. Since there may be more than one relationship, we collect them into a vector. For the moment, let's insert only the same-length? relationship.

{:speculoos/argument-return-relationships [{:path-argument [0]
                                            :path-return []
                                            :relationship-fn same-length?}]}

Eventually, we'll test all three relationships, but for now, we'll focus on same-length?.

We're ready to validate.

(validate-fn-with
  broken-reverse
  {:speculoos/argument-return-relationships
     [{:path-argument [0],
       :path-return [],
       :relationship-fn same-length?}]}
  [11 22 33 44 55])
;; => ({:datum-argument [11 22 33 44 55],
;;      :datum-return [11 22 33 44 55 9999],
;;      :fn-spec-type :speculoos/argument-return-relationship,
;;      :path-argument [0],
;;      :path-return [],
;;      :relationship-fn same-length?,
;;      :valid? false})

We supplied broken-reverse with a five-element vector, and it returned a six-element vector, failing to satisfy the specified same-length? relationship.

We wrote two other relationship functions, but same-elements? and reversed? are merely floating around in the current namespace. We did not send them to validate-fn-with, so it checked only same-length?, which we explicitly supplied. Remember Motto #3: Un-paired predicates (or, relationships in this instance) are ignored.

Let's check all three relationships now. We explicitly supply the additional two relationship predicates, all with the same paths.

(validate-fn-with
  broken-reverse
  {:speculoos/argument-return-relationships
     [{:path-argument [0],
       :path-return [],
       :relationship-fn same-length?}
      {:path-argument [0],
       :path-return [],
       :relationship-fn same-elements?}
      {:path-argument [0],
       :path-return [],
       :relationship-fn reversed?}]}
  [11 22 33 44 55])
;; => ({:datum-argument [11 22 33 44 55],
;;      :datum-return [11 22 33 44 55 9999],
;;      :fn-spec-type :speculoos/argument-return-relationship,
;;      :path-argument [0],
;;      :path-return [],
;;      :relationship-fn same-length?,
;;      :valid? false}
;;     {:datum-argument [11 22 33 44 55],
;;      :datum-return [11 22 33 44 55 9999],
;;      :fn-spec-type :speculoos/argument-return-relationship,
;;      :path-argument [0],
;;      :path-return [],
;;      :relationship-fn same-elements?,
;;      :valid? false}
;;     {:datum-argument [11 22 33 44 55],
;;      :datum-return [11 22 33 44 55 9999],
;;      :fn-spec-type :speculoos/argument-return-relationship,
;;      :path-argument [0],
;;      :path-return [],
;;      :relationship-fn reversed?,
;;      :valid? false})

broken-reverse is truly broken. The same-length? result appears again, and then we see the two additional unsatisfied relationships because we added same-elements? and reversed?. broken-reverse returns a vector with more and different elements, and the order is not reversed.

Just for amusement, let's see what happens when we validate clojure.core/reverse with the exact same relationship specifications.

(reverse [11 22 33 44 55]) ;; => (55 44 33 22 11)

(validate-fn-with   reverse   {:speculoos/argument-return-relationships   [{:path-argument [0],   :path-return [],   :relationship-fn same-length?}   {:path-argument [0],   :path-return [],   :relationship-fn same-elements?}   {:path-argument [0],   :path-return [],   :relationship-fn reversed?}]}   [11 22 33 44 55]) ;; => (55 44 33 22 11)

clojure.core/reverse satisfies all three argument/return relationships, so validate-fn-with passes through the correctly-reversed output.

Not every function consumes a collection. Some functions consume a scalar value. Some functions return a scalar. And some functions have the audacity to do both. validate-fn-with can validate that kind of argument/return relationship.

I'll warn you now, I'm planning on writing a buggy increment function. We could express two ideas about the argument/return relationship. First, a correctly-working increment function, when supplied with a number, n, ought to return a number that is larger than n. Second, a correctly-working return value ought to be n plus one. Let's specify those relationships.

The first predicate tests Is the second number larger than the first number? We don't need to write a special predicate for this job; clojure.core provides one that does everything we need.

(< 99 100) ;; => true
(< 99 -99) ;; => false

When supplied with two (or more) numbers, predicate < returns true if the second number is larger than the first number.

The second predicate tests Is the second number equal to the first number plus one?

(defn plus-one? [n1 n2] (= (+ n1 1) n2))

(plus-one? 99 100) ;; => true
(plus-one? 99 -99) ;; => false

When supplied with two numbers, predicate plus-one? returns true if the second number equals the sum of the first number and one.

Validating argument/return relationships requires us to declare which parts of the argument sequence and which parts of the return value to send to the relationship function. When we invoke the increment function with a single number, the number lives in the first spot of the argument sequence, so it will have a path of [0]. The increment function will return a 'bare' number, so a path is not really an applicable concept. We previously saw how a nil path indicates a bare scalar, so now we can assemble the two relationship maps, one each for < and plus-one?.

{:path-argument [0]
 :path-return nil
 :relationship-fn <}

{:path-argument [0]  :path-return nil  :relationship-fn plus-one?}

Now is a good time to write the buggy incrementing function.

(defn buggy-inc [n] (- n))

(buggy-inc 99) ;; => -99

Looks plenty wrong. Let's see exactly how wrong.

(validate-fn-with
  buggy-inc
  {:speculoos/argument-return-relationships
     [{:path-argument [0],
       :path-return nil,
       :relationship-fn <}
      {:path-argument [0],
       :path-return nil,
       :relationship-fn plus-one?}]}
  99)
;; => ({:datum-argument 99,
;;      :datum-return -99,
;;      :fn-spec-type :speculoos/argument-return-relationship,
;;      :path-argument [0],
;;      :path-return nil,
;;      :relationship-fn <,
;;      :valid? false}
;;     {:datum-argument 99,
;;      :datum-return -99,
;;      :fn-spec-type :speculoos/argument-return-relationship,
;;      :path-argument [0],
;;      :path-return nil,
;;      :relationship-fn plus-one?,
;;      :valid? false})

buggy-inc's return value failed to satisfy both relationships with its argument. -99 is not larger than 99, nor is it what we'd get by adding one to 99.

Just to verify that our relationships are doing what we think they're doing, let's run the same thing on clojure.core/inc.

(validate-fn-with
  inc
  {:speculoos/argument-return-relationships
     [{:path-argument [0],
       :path-return nil,
       :relationship-fn <}
      {:path-argument [0],
       :path-return nil,
       :relationship-fn plus-one?}]}
  99)
;; => 100

inc correctly returns 100 when invoked with 99, so both < and plus-one? relationships are satisfied. Since all relationships were satisfied, the return value 100 passes through.

So far, the :path-arguments and the :path-returns have been similar between relationship specifications, but they don't need to be. I'm going to invent a really contrived example. pull-n-put and pull-n-whoops are both intended to pull out emails and phone numbers and stuff them into some output vectors. Here's our raw person data: emails and phone numbers.

(def person-1 {:email "aragorn@sonofarath.org", :phone "867-5309"})
(def person-2 {:email "vita@meatavegam.info", :phone "123-4567"})
(def person-3 {:email "jolene@justbecauseyou.com", :phone "555-FILK"})

pull-n-put is the correct implementation, producing correct results.

;; correct implementation

(defn pull-n-put   [p1 p2 p3]   {:email-addresses [(p1 :email) (p2 :email) (p3 :email)],   :phone-numbers [(p1 :phone) (p2 :phone) (p3 :phone)]})


;; intended results

(pull-n-put person-1 person-2 person-3) ;; => {:email-addresses ["aragorn@sonofarath.org" ;; "vita@meatavegam.info" ;; "jolene@justbecauseyou.com"], ;; :phone-numbers ["867-5309" "123-4567" "555-FILK"]}

pull-n-put pulls out the email addresses and phone numbers and puts each at the proper place. However, pull-n-whoops

;; incorrect implementation

(defn pull-n-whoops   [p1 p2 p3]   {:email-addresses [(p1 :phone) (p2 :phone) (p3 :phone)],   :phone-numbers [:apple :banana :mango]})


;; wrong results

(pull-n-whoops person-1 person-2 person-3) ;; => {:email-addresses ["867-5309" "123-4567" "555-FILK"], ;; :phone-numbers [:apple :banana :mango]}

…does neither. pull-n-whoops puts the phone numbers where the email addresses ought to be and inserts completely bogus phone numbers.

We can specify a couple of relationships to show that pull-n-whoops produces a return value that does not validate. In a correctly-working implementation, the scalars aren't transformed, per se, merely moved to another location. So our relationship function will merely be equality, and the paths will do all the work.

Phone number 555-FILK at argument path [2 :phone] ought to appear at return path [:phone-numbers 2]. That relationship specification looks like this.

{:path-argument [2 :phone]
 :path-return [:phone-numbers 2]
 :relationship-fn =}

Similarly, email address aragorn@sonofarath.org at argument path [0 :email] ought to appear at return path [:email-addresses 0]. That relationship specification looks like this.

{:path-argument [0 :email]
 :path-return [:phone-numbers 0]
 :relationship-fn =}

Now, we insert those two specifications into a vector and associate that vector into the organizing map.

{:speculoos/argument-return-relationships [{:path-argument [2 :phone]
                                             :path-return [:phone-numbers 2]
                                             :relationship-fn =}]}
                                            {:path-argument [0 :email]
                                             :path-return [:email-addresses 0]
                                             :relationship-fn =}

All that remains is to consult validate-fn-with to see if the relationships are satisfied. First, we'll do pull-n-put, which should yield the intended results.

(validate-fn-with pull-n-put
                  {:speculoos/argument-return-relationships
                     [{:path-argument [2 :phone],
                       :path-return [:phone-numbers 2],
                       :relationship-fn =}
                      {:path-argument [0 :email],
                       :path-return [:email-addresses 0],
                       :relationship-fn =}]}
                  person-1
                  person-2
                  person-3)
;; => {:email-addresses
;;       ["aragorn@sonofarath.org"
;;        "vita@meatavegam.info"
;;        "jolene@justbecauseyou.com"],
;;     :phone-numbers ["867-5309" "123-4567"
;;                     "555-FILK"]}

Yup. pull-n-put's return value satisfied both equality relationships with the arguments we supplied, so validate-fn-with passed on that correct return value.

Now we'll validate pull-n-whoops, which does not produce correct results.

(validate-fn-with pull-n-whoops
                  {:speculoos/argument-return-relationships
                     [{:path-argument [2 :phone],
                       :path-return [:phone-numbers 2],
                       :relationship-fn =}
                      {:path-argument [0 :email],
                       :path-return [:email-addresses 0],
                       :relationship-fn =}]}
                  person-1
                  person-2
                  person-3)
;; => ({:datum-argument "555-FILK",
;;      :datum-return :mango,
;;      :fn-spec-type
;;        :speculoos/argument-return-relationship,
;;      :path-argument [2 :phone],
;;      :path-return [:phone-numbers 2],
;;      :relationship-fn =,
;;      :valid? false}
;;     {:datum-argument
;;        "aragorn@sonofarath.org",
;;      :datum-return "867-5309",
;;      :fn-spec-type
;;        :speculoos/argument-return-relationship,
;;      :path-argument [0 :email],
;;      :path-return [:email-addresses 0],
;;      :relationship-fn =,
;;      :valid? false})

validate-fn-with tells us that pull-n-whoops's output satisfies neither argument/return relationship. Where we expected phone number 555-FILK, we see :mango, and where we expected email aragorn@sonofarath.org, we see phone number 867-5309.

The idea to grasp from validating pull-n-put and pull-n-whoops is that even though the relationship function was a basic equality =, the relationship validation is precise, flexible, and powerful because we used paths to focus on exactly the relationship we're interested in. On the other hand, whatever function we put at :relationship-fn is completely open-ended, and can be similarly sophisticated.

Before we finish this subsection, I'd like to demonstrate how to combine all five types of validation: argument scalars, argument collections, return scalars, return collections, and argument/return relationship. We'll rely on our old friend broken-reverse. Let's remember what broken-reverse actually does.

(broken-reverse [11 22 33 44 55]) ;; => [11 22 33 44 55 9999]

Instead of properly reversing the argument collection, it merely appends a spurious 9999.

We'll pass a vector as the first and only argument. Within that vector, we pretend to not care about the first two elements, so we'll use any? predicates as placeholders. We'll specify the third element of that vector to be a decimal with a decimal? scalar predicate. The entire argument sequence is validated, so we must make sure the shape of the scalar specification mimics the shape of the data.

:speculoos/arg-scalar-spec [[any? any? decimal?]]

Just so we see an invalid result, we'll make the argument collection specification expect a list, even though we know we'll be passing a vector. And again, we must make the collection specification's shape mimic the data, so to mimic the argument sequence, it looks like this.

:speculoos/arg-collection-spec [[list?]]

We know that broken-reverse returns the input collection with 9999 conjoined. We'll write the return scalar specification to expect a string in the fourth slot, merely so that we'll see integer 44 fail to satisfy.

:speculoos/ret-scalar-spec [any? any? any? string?]

And since we're expecting broken-reverse to return a vector, we'll write the return collection specification to expect a set.

:speculoos/ret-collection-spec [set?]

Finally, we've previously demonstrated that broken-reverse fails to satisfy the reversed? argument/return relationship specification. We'll pass reversed? the first argument and the entire return.

:speculoos/argument-return-relationships [{:path-argument [0]
                                           :path-return []
                                           :relationship-fn reversed?}]

We assemble all five of those specifications into an organizing map…

(def organizing-map
  {:speculoos/arg-scalar-spec [[any? any? decimal?]],
   :speculoos/arg-collection-spec [[list?]],
   :speculoos/ret-scalar-spec [any? any? any? string?],
   :speculoos/ret-collection-spec [set?],
   :speculoos/argument-return-relationships
     [{:path-argument [0], :path-return [], :relationship-fn reversed?}]})

…and invoke validate-fn-with.

(validate-fn-with broken-reverse
                  organizing-map
                  [11 22 33 44 55])
;; => ({:datum 33,
;;      :fn-spec-type :speculoos/argument,
;;      :path [0 2],
;;      :predicate decimal?,
;;      :valid? false}
;;     {:datum [11 22 33 44 55],
;;      :fn-spec-type :speculoos/argument,
;;      :ordinal-path-datum [0],
;;      :path-datum [0],
;;      :path-predicate [0 0],
;;      :predicate list?,
;;      :valid? false}
;;     {:datum 44,
;;      :fn-spec-type :speculoos/return,
;;      :path [3],
;;      :predicate string?,
;;      :valid? false}
;;     {:datum [11 22 33 44 55 9999],
;;      :fn-spec-type :speculoos/return,
;;      :ordinal-path-datum [],
;;      :path-datum [],
;;      :path-predicate [0],
;;      :predicate set?,
;;      :valid? false}
;;     {:datum-argument [11 22 33 44 55],
;;      :datum-return [11 22 33 44 55 9999],
;;      :fn-spec-type :speculoos/argument-return-relationship,
;;      :path-argument [0],
;;      :path-return [],
;;      :relationship-fn reversed?,
;;      :valid? false})

We supplied five specifications, five datums failed to satisfy those specifications, and we receive five invalidation entries.

  • Argument scalar 33 did not satisfy decimal?.
  • Argument collection [11 22 33 44 55] did not satisfy list?.
  • Return scalar 44 did not satisfy string?.
  • Return collection [11 22 33 4 55 9999] did not satisfy set?.
  • Argument [11 22 33 44 55] and return [11 22 33 44 55 9999] did not satisfy relationship reversed?.

Recognized metadata specification keys

Speculoos consults the following defined group of keys in a specification map when it validates.

speculoos.function-specs/recognized-spec-keys
;; => [:speculoos/arg-scalar-spec
;;     :speculoos/arg-collection-spec
;;     :speculoos/ret-scalar-spec
;;     :speculoos/ret-collection-spec
;;     :speculoos/argument-return-relationships
;;     :speculoos/canonical-sample
;;     :speculoos/predicate->generator
;;     :speculoos/hof-specs]

Function Metadata Specifications

Speculoos offers three patterns of function validation.

  1. validate-fn-with performs explicit validation with a specification supplied in a separate map. The function var is not altered.
  2. validate-fn performs explicit validation with specifications contained in the function's metadata.
  3. instrument provides implicit validation with specifications contained in the function's metadata.

Up until this point, we've been using the most explicit variant, validate-fn-with because its behavior is the most readily apparent. validate-fn-with is nice when we want to quickly validate a function on-the-fly without messing with the function's metadata. We merely supply the function's name, a map of specifications, and a sequence of arguments as if we were directly invoking the function.

Speculoos function specifications differ from spec.alpha in that they are stored and retrieved directly from the function's metadata. Speculoos is an experimental library, and I wanted to test whether it was a good idea to store a function's specifications in its own metadata. I thought it would be nice if I could hand you one single thing and say

Here's a Clojure function you can use. Its name suggests what it does, its docstring tells you how to use it, and human- and machine-readable specifications check the validity of the inputs, and tests that it's working properly. All in one neat, tidy S-expression.

To validate a function with metadata specifications, we use validate-fn (or as we'll discuss later, instrument). Speculoos offers a pair convenience functions to add and remove specifications from a function's metadata. To add, use inject-specs!. Let's inject a couple of function specifications to sum-three which we saw earlier.

(require '[speculoos.function-specs :refer
           [validate-fn inject-specs! unject-specs!]])

(inject-specs! sum-three   {:speculoos/arg-scalar-spec [int? int? int?],   :speculoos/ret-scalar-spec int?}) ;; => nil

We can observe that the specifications indeed live in the function's metadata with clojure.core/meta. There's a lot of metadata, so we'll use select-keys to extract only the key-values associated by inject-specs!.

(select-keys (meta #'sum-three) speculoos.function-specs/recognized-spec-keys)
;; => #:speculoos{:arg-scalar-spec [int? int? int?],
;;                :ret-scalar-spec int?}

We see that inject-specs! injected both an argument scalar specification and a return scalar specification.

If we later decided to undo that, unject-specs! removes all recognized Speculoos specification entries, regardless of how they got there (maybe some combination of inject-specs! and with-meta). For the upcoming demonstrations, though, we'll keep those specifications in sum-three's metadata.

Now that sum-three holds the specifications in its metadata, we can try the second pattern of explicit validation pattern, using validate-fn. It's similar to validate-fn-with, except we don't have to supply the specification map; it's already waiting in the metadata. Invoked with valid arguments, sum-three returns a valid value.

(validate-fn sum-three 1 20 300) ;; => 321

Invoking sum-three with an invalid floating-point number, Speculoos interrupts with a validation report.

(validate-fn sum-three 1 20 300.0)
;; => ({:datum 300.0,
;;      :fn-spec-type :speculoos/argument,
;;      :path [2],
;;      :predicate int?,
;;      :valid? false}
;;     {:datum 321.0,
;;      :fn-spec-type :speculoos/return,
;;      :path nil,
;;      :predicate int?,
;;      :valid? false})

Scalar argument 300.0 failed to satisfy its paired scalar predicate int?. Also, scalar return 321.0 failed to satisfy its paired scalar predicate int?.

The metadata specifications are passive and have no effect during normal invocation.

(sum-three 1 20 300.0) ;; => 321.0

Even though sum-three currently holds a pair of scalar specifications within its metadata, directly invoking sum-three does not initiate any validation.

validate-fn only interrupts when a predicate paired with a datum is not satisfied. If we remove all the specifications, there won't be any predicates. Let's remove sum-three's metadata specifications with unject-specs!.

(unject-specs! sum-three) ;; => nil


;; all recognized keys are removed

(select-keys (meta #'sum-three) speculoos.function-specs/recognized-spec-keys) ;; => {}

Now that sum-three's metadata no longer contains specifications, validate-fn will not perform any validations.

(validate-fn sum-three 1 20 300.0) ;; => 321.0

The return value 321.0 merely passes through because there were zero predicates.

We can try a more involved example. Let's inject that messy ball of metadata specifications into broken-reverse.

(inject-specs!
  broken-reverse
  {:speculoos/arg-scalar-spec [[any? any? decimal?]],
   :speculoos/arg-collection-spec [[list?]],
   :speculoos/ret-scalar-spec [any? any? any? string?],
   :speculoos/ret-collection-spec [set?],
   :speculoos/argument-return-relationships
     [{:path-argument [0], :path-return [], :relationship-fn reversed?}]})
;; => nil

Now we double-check the success of injecting the metadata.

(select-keys (meta #'broken-reverse)
             speculoos.function-specs/recognized-spec-keys)
;; => #:speculoos{:arg-collection-spec [[list?]],
;;                :arg-scalar-spec [[any? any? decimal?]],
;;                :argument-return-relationships [{:path-argument [0],
;;                                                 :path-return [],
;;                                                 :relationship-fn reversed?}],
;;                :ret-collection-spec [set?],
;;                :ret-scalar-spec [any? any? any? string?]}

We confirm that all five function specifications in broken-reverse's metadata. validate-fn can now find those specifications.

Finally, we validate broken-reverse.

(validate-fn broken-reverse [11 22 33 44 55])
;; => ({:datum 33,
;;      :fn-spec-type :speculoos/argument,
;;      :path [0 2],
;;      :predicate decimal?,
;;      :valid? false}
;;     {:datum [11 22 33 44 55],
;;      :fn-spec-type :speculoos/argument,
;;      :ordinal-path-datum [0],
;;      :path-datum [0],
;;      :path-predicate [0 0],
;;      :predicate list?,
;;      :valid? false}
;;     {:datum 44,
;;      :fn-spec-type :speculoos/return,
;;      :path [3],
;;      :predicate string?,
;;      :valid? false}
;;     {:datum [11 22 33 44 55 9999],
;;      :fn-spec-type :speculoos/return,
;;      :ordinal-path-datum [],
;;      :path-datum [],
;;      :path-predicate [0],
;;      :predicate set?,
;;      :valid? false}
;;     {:datum-argument [11 22 33 44 55],
;;      :datum-return [11 22 33 44 55 9999],
;;      :fn-spec-type
;;        :speculoos/argument-return-relationship,
;;      :path-argument [0],
;;      :path-return [],
;;      :relationship-fn reversed?,
;;      :valid? false})

Notice, this is the exact same validation as before, but because all the messy specifications were already tucked away in the metadata, the validation invocation was a much cleaner one-liner,
(validate-fn broken-reverse [11 22 33 44 55]).

Again, metadata specification have no effect when the function is directly invoked.

(broken-reverse [11 22 33 44 55]) ;; => [11 22 33 44 55 9999]

We never unjected the specifications from broken-reverse's metadata, but they have absolutely no influence outside of Speculoos' function validation.

Instrumenting Functions

Beware: instrument-style function validation is very much a work in progress. The current implementation is sensitive to invocation order and can choke on multiple calls.

Until this point in our discussion, Speculoos has only performed function validation when we explicitly called either validate-fn-with or validate-fn. With those two utilities, the specifications in the metadata are passive and produce no effect, even when invoking with arguments that would otherwise fail to satisfy the specification's predicates.

Speculoos' third pattern of function validation instruments the function using the metadata specifications. Every direct invocation of the function itself automatically validates arguments and returns using any specification in the metadata. Let's explore function instrumentation using sum-three from earlier. instrument will only validate with metadata specifications. First, we need to inject our specifications.

(inject-specs! sum-three
               {:speculoos/arg-scalar-spec [int? int? int?],
                :speculoos/ret-scalar-spec int?})
;; => nil

sum-three now holds two scalar specifications within its metadata, but those specifications are merely sitting there, completely passive.

;; valid args and return value

(sum-three 1 20 300) ;; => 321


;; invalid arg 300.0 and invalid return value 321.0

(sum-three 1 20 300.0) ;; => 321.0

That second invocation above supplied an invalid argument and produced an invalid return value, according to the metadata specifications. But we didn't explicitly validate with validate-fn, and sum-three is not yet instrumented, so sum-three returns the computed value 321.0 without interruption.

Let's instrument sum-three and see what happens.

(require '[speculoos.function-specs :refer [instrument unstrument]])

(instrument sum-three)

Not much. We've only added the specifications to the metadata and instrumented sum-three. An instrumented function is only validated when it is invoked.

(sum-three 1 20 300) ;; => 321

We just invoked sum-three, but all three integer arguments and the bare scalar return value satisfied all their predicates, so 321 passes through. Let's invoke with two integer arguments and one non-integer argument.

;; arg 300.0 does not satisfy its paired predicate in the argument scalar specification,
;; but `sum-three` is capable of computing a return with those given inputs

(sum-three 1 20 300.0) ;; => 321.0

That's interesting. In contrast to validate-fn-with and validate-fn, an instrumented function is not interrupted with an invalidation report when predicates are not satisfied. The invalidation report is instead written to *out*.

;; validation report is written to *out*

(with-out-str (sum-three 1 20 300.0))
;; => ({:path [2], :datum 300.0, :predicate int?, :valid? false, :fn-spec-type :speculoos/argument} ;; {:path nil, :datum 321.0, :predicate int?, :valid? false, :fn-spec-type :speculoos/return})

Speculoos will implicitly validate any instrumented function with any permutation of specifications within its metadata.

When we want to revert sum-three back to normal, we unstrument it.

(unstrument sum-three)

Now that it's no longer instrumented, sum-three will yield values as normal, even if the arguments and return value do not satisfy the metadata specifications.

;; valid arguments and return value
(sum-three 1 20 300) ;; => 321

;; one invalid argument, invalid return value
(sum-three 1 20 300.0) ;; => 321.0

;; nothing written to *out*
(with-out-str (sum-three 1 20 300.0)) ;; => ""

Validating Higher-Order Functions

Speculoos has a story about validating higher-order functions, too. It uses very similar patterns to first-order function validation: Put some specifications in the function's metadata with the proper, qualified keys, then invoke the function with some sample arguments, then Speculoos will validate the results.

The classic hof is something like (defn adder [x] #(+ x %)). To make things a tad more interesting, we'll add a little flourish.

(require '[speculoos.function-specs :refer [validate-higher-order-fn]])

(defn addder [x] (fn [y] (fn [z] (+ x (+ y z)))))

(((addder 7) 80) 900) ;; => 987

addder returns a function upon each of its first two invocations, and only on its third invocation does addder return a scalar. Specifying and validating a function object does not convey much meaning: it would merely satisfy fn? which isn't very interesting. So to validate a hof, Speculoos requires it to be invoked until it produces a value. So we'll supply the validator with a series of argument sequences that, when fed in order to the hof, will produce a result. For the example above, it will look like [7] [80] [900].

The last task we must do is create the specification. hof specifications live in the function's metadata at key :speculoos/hof-specs, which is a series of nested specification maps, one nesting for each returned function. For this example, we might create this hof specification.

(def addder-spec
  {:speculoos/arg-scalar-spec [string?],
   :speculoos/hof-specs {:speculoos/arg-scalar-spec [boolean?],
                         :speculoos/hof-specs
                           {:speculoos/arg-scalar-spec [char?],
                            :speculoos/ret-scalar-spec keyword?}}})

Once again, for illustration purposes, we've crafted a specification composed of predicates that we know will invalidate, but will permit the function stack to evaluate to completion. (Otherwise, validation halts on exceptions.)

hof validation requires that the function's metadata hold the specifications. So we inject them.

(inject-specs! addder addder-spec) ;; => nil

And finally, we execute the validation with validate-higher-order-fn

(require '[speculoos.function-specs :refer [validate-higher-order-fn]])

(validate-higher-order-fn addder [7] [80] [900]) ;; => ({:datum 7, ;; :fn-tier :speculoos/argument, ;; :path [0 0], ;; :predicate string?, ;; :valid? false} ;; {:datum 80, ;; :fn-tier :speculoos/argument, ;; :path [1 0], ;; :predicate boolean?, ;; :valid? false} ;; {:datum 900, ;; :fn-tier :speculoos/argument, ;; :path [2 0], ;; :predicate char?, ;; :valid? false} ;; {:datum 987, ;; :evaled-result 987, ;; :fn-spec-type :speculoos/return, ;; :fn-tier :speculoos/return, ;; :path nil, ;; :predicate keyword?, ;; :valid? false})

Let's step through the validation results. Speculoos validates 7 against scalar predicate string? and then invokes addder with argument 7. It then validates 80 against scalar predicate boolean? and then invokes the returned function with argument 80. It then validates 900 against scalar predicate char? and invokes the previously returned function with argument 900. Finally, Speculoos validates the ultimate return value 987 against scalar predicate keyword?. If all the predicates were satisfied, validate-higher-order-fn would yield the return value of the function call. In this case, all three arguments and the return value are invalid, and Speculoos yields a validation report.

Generating Random Samples and Exercising

Before we have some fun with random samples, we must create random sample generators and put them in particular spots. Random sample generators are closely related to predicates. A predicate is a thing that can answer Is the value you put in my hand an even, positive integer between ninety and one-hundred? A random sample generator is a thing that says I'm putting in your hand an even, positive integer between ninety and one-hundred. When properly constructed, a generator will produce samples that satisfy its companion predicate.

Starting with a quick demonstration, Speculoos can generate valid data when given a scalar specification.

(require '[speculoos.utility :refer [data-from-spec]])

(data-from-spec [int? string? keyword?] :random) ;; => [-806 "ROFpe52m9eSfSnW8CfTtY" :i!]

When dealing with the basic clojure.core predicates, such as int?, string?, keyword?, etc., Speculoos provides pre-made random sample generators that satisfy those predicates. (There are a few exceptions, due to the fact that there is not a one-to-one-to-one correspondence between scalar data types, clojure.core predicates, and clojure.test.check generators.)

Speculoos can also generate random scalar samples from predicate-like things, such as regular expressions and sets.

      built-in               v--- regex        v--- set-as-a-predicate
predicate ----v
(data-from-spec {:x int?, :y #"fo{3,6}bar", :z #{:red :green :blue}} :random) ;; => {:x 30, :y "fooooobar", :z :red}

When we use either a 'basic' scalar predicate, such as int?, a regex, or a set-as-a-predicate, Speculoos knows how to generate a valid random sample that satisfies that predicate-like thing. Within the context of generating samples or exercising, basic predicate int? elicits an integer, regular expression #fo{3,6} generates a valid string, and set-as-a-predicate #{:red :green :blue} emits a sample randomly drawn from that set.

Creating Sample Generators

This document often uses 'basic' predicates like int? and string? because they're short to type and straightforward to understand. In real life, we'll want to specify our data with more precision. Instead of merely An integer, we'll often want to express a more sophisticated description, such as An even positive integer between ninety and one-hundred. To do that, we need to create custom generators.

clojure.test.check provides a group of powerful, flexible, generators.

(require '[clojure.test.check.generators :as gen])

(gen/generate (gen/large-integer* {:min 700, :max 999})) ;; => 981

(gen/generate gen/keyword) ;; => :BYr

(gen/generate gen/string-alphanumeric) ;; => "G"

Speculoos leans heavily on these generators.

Storing and Accessing Sample Generators

The custom generators we discussed in the previous subsection are merely floating around in the ether. To use them for exercising, we need to put those generators in a spot that Speculoos knows: the predicate's metadata.

Let's imagine a scenario. We want a predicate that specifies an integer between ninety (inclusive) and one-hundred (exclusive) and a corresponding random sample generator. First, we write the predicate, something like this.

(fn [n] (and (int? n) (<= 90 n 99)))

Second, we write our generator.

;; produce ten samples with `gen/sample`

(gen/sample (gen/large-integer* {:min 90, :max 99})) ;; => (90 90 91 90 95 97 93 96 96 91)


;; produce one sample with `gen/generate`

(gen/generate (gen/large-integer* {:min 90, :max 99})) ;; => 96

To make the generator invocable, we'll wrap it in a function.

(defn generate-nineties
  []
  (gen/generate (gen/large-integer* {:min 90, :max 99})))


;; invoke the generator

(generate-nineties) ;; => 98

Third, we need to associate that generator into the predicate's metadata. We have a couple of options. The manual option uses with-meta when we bind a name to the function body. We'll associate generate-nineties to the predicate's metadata key :speculoos/predicate->generator.

(def nineties?
  (with-meta (fn [n] (and (int? n) (<= 90 n 99)))
    {:speculoos/predicate->generator generate-nineties}))


(nineties? 92) ;; => true


(meta nineties?) ;; => #:speculoos{:predicate->generator generate-nineties}

That gets the job done, but the manual option is kinda cluttered. The other option involves a Speculoos utility, defpred, that defines a predicate much the same as defn, but associates the generator with less keyboarding than the with-meta option. Supply a symbol, a predicate function body, and a random sample generator.

(require '[speculoos.utility :refer [defpred]])


(defpred NINEties? (fn [n] (and (int? n) (<= 90 n 99))) generate-nineties)


(NINEties? 97) ;; => true


(meta NINEties?) ;; => #:speculoos{:canonical-sample :NINEties?-canonical-sample, ;; :predicate->generator generate-nineties}

defpred automatically puts generate-nineties into the predicate NINEties? metadata. Soon, we'll discuss another couple of benefits to using defpred. Whichever way we accomplished getting the generator into the metadata at :speculoos/predicate->generator, Speculoos can now find it.

Speculoos uses function metadata for two purposes, and it's important to keep clear in our minds which is which.

  • Store function specifications in the metadata for that function. For example, if we have a reverse function, we put the specification to test equal-lengths? in the metadata at :speculoos/argument-return-relationships.

  • Store random sample generators in the metadata for that predicate. If we have a nineties? predicate, we put the random sample generator generate-nineties in the metadata at :speculoos/predicate->generator.

Creating Sample Generators Automatically

defpred does indeed relieve us of some tedious keyboarding, but it offers another benefit. If we arrange the predicate definition according to defpred's expectations, it can automatically create a random sample generator for that predicate. Let's see it in action and then we'll examine the details.

(defpred auto-nineties? (fn [n] (and (int? n) (<= 90 n 99))))


(meta auto-nineties?) ;; => #:speculoos{:canonical-sample :auto-nineties?-canonical-sample,   :predicate->generator #fn--88795}

Well, there's certainly something at :speculoos/predicate->generator, but is it anything useful?

(binding [speculoos.utility/*such-that-max-tries* 1000]
  (let [possible-gen-90 (:speculoos/predicate->generator (meta auto-nineties?))]
    (possible-gen-90)))
;; => 91

Yup! Since it is not-so-likely that a random integer generator would produce a value in the nineties, we bound the max-tries to a high count to give the generator lots of attempts. We then pulled out the generator from predicate auto-nineties?'s metadata and bound it to possible-gen-90. Then we invoked possible-gen-90 and, in fact, it generated an integer in the nineties that satisfies the original predicate we defined as auto-nineties. defpred automatically created a random sample generator whose output satisfies the predicate.

For defpred to do its magic, the predicate definition must follow a few patterns.

  • We must provide the textual representation of the definition. We can't merely assign another already-defined function.
  • The first symbol must be and, or, or a basic predicate for a Clojure built-in scalar, such as int?, that is registered at speculoos.utility/predicate->generator.
    (and (...)) ;; okay
    (or (...)) ;; okay
    (int? ...) ;; okay
    (let ...) ;; not okay
  • The first clause after and and all immediate descendants of or must start with a basic predicate described above.

Subsequent clauses of and will be used to create test.check.generators/such-that modifiers. Direct descendants of a top-level or will producen separate random sample generators, each with 1/n probability.

Speculoos exposes the internal tool defpred uses to create a generator, so we can inspect how it works. (I've lightly edited the output for clarity.)

(require '[speculoos.utility :refer [inspect-fn]])


(inspect-fn '(fn [i] (int? i)))
;; => gen/small-integer

We learn that inspect-fn examines the textual representation of the predicate definition, extracts int? and infers that the base generator ought to be gen/small-integer. Next, we'll add a couple of modifiers with and. To conform to the requirements, we'll put int? in the first clause. (Again, lightly edited.)

(inspect-fn '(fn [i] (and (int? i) (even? i) (pos? i))))
;; => (gen/such-that (fn [i] (and (even? i) (pos? i))) ;; gen/small-integer {:max-tries speculoos.utility/*such-that-max-tries*})

int? elicits a small-integer generator. inspect-fn then uses the subsequent clauses of the and expression to create a such-that modifier that generates only positive, even numbers.

Let's see what happens with an or.

(inspect-fn '(fn [x] (or (int? x) (string? x))))
;; => (gen/one-of [gen/small-integer gen/string-alphanumeric])

Our predicate definition is satisfied with either an integer or a string. inspect-fn therefore creates a generator that will produce either an integer or a string with equal probability.

When automatically creating random sample generators, defpred handles nesting up to two levels deep. Let's see how we might combine both or and and. We'll define a predicate that tests for either an odd integer, a string of at least three characters, or a ratio greater than one-ninth.

(defpred combined-pred
         #(or (and (int? %) (odd? %))
              (and (string? %) (<= 3 (count %)))
              (and (ratio? %) (< 1/9 %))))


(data-from-spec {:a combined-pred,   :b combined-pred,   :c combined-pred,   :d combined-pred,   :e combined-pred,   :f combined-pred,   :g combined-pred,   :h combined-pred,   :i combined-pred}   :random) ;; => {:a 7/5, ;; :b 19, ;; :c "IyQXrGo7gJ4H5p", ;; :d 27/2, ;; :e -9, ;; :f 5/2, ;; :g -13, ;; :h 3/22, ;; :i 9}

We're kinda abusing data-from-spec here to generate nine samples. Inferring from combined-pred's predicate structure, defpred's automatically-created random sample generator emits one of three elements with equal probability: an odd integer, a string of at least three characters, or a ratio greater than one-ninth. All we had to do was write the predicate; defpred wrote all three random sample generators.

Testing Sample Generators Residing in Metadata

Some scenarios block us from using defpred's automatic generators. We may not have access to the textual representation of the predicate definition. Or, sometimes we must hand-write a generator because a naive generator would be unlikely to find a satisfying value (e.g., a random number that must fall within a narrow range).

The Write-generator-then-Apply-to-metadata-then-Test loop can be tedious, so the utility namespace provides a tool to help. validate-predicate->generator accepts a predicate function we supply, extracts the random sample generator residing in its metadata, generates a sample, and then feeds that sample back into the predicate to see if it satisfies.

(require '[speculoos.utility :refer [validate-predicate->generator]])


(defpred pred-with-incorrect-generator   (fn [i] (int? i))   #(gen/generate gen/ratio))


(validate-predicate->generator pred-with-incorrect-generator) ;; => ([-2 true] ;; [-23/28 false] ;; [8/9 false] ;; [-1/7 false] ;; [11/15 false] ;; [-19/28 false] ;; [-6/7 false])

We defined scalar predicate pred-with-incorrect-generator to require an integer, but, using defpred, we manually created a generator that emits ratio values. Each of the generated samples fails to satisfy the int? predicate.

With help from validate-predicate->generator, we can hop back and forth to adjust the hand-made generator.

(defpred pred-with-good-generator
         (fn [i] (int? i))
         #(gen/generate gen/small-integer))


(validate-predicate->generator pred-with-good-generator) ;; => ([26 true] ;; [-15 true] ;; [-21 true] ;; [-27 true] ;; [2 true] ;; [-23 true] ;; [10 true])

In this particular case, we could have relied on defpred to create a sample generator for us.

Pretend somebody hands us a specification. It might be useful to know if we need to write a random sample generator for any of the predicates it contains, or if Speculoos can find a generator for all of them, either in the collection of known predicates-to-generators associations, or in the predicates' metadata. unfindable-generators tells us this information.

Let's compose a scalar specification containing int?, a set-as-a-predicate #{:red :green :blue}, and a regular expression #"fo{2,5}".

(require '[speculoos.utility :refer [unfindable-generators]])


(unfindable-generators [int? #{:red :green :blue} #"fo{2,5}"]) ;; => []

Speculoos knows how to create random samples from all three of those predicate-like things, so unfindable-generators returns an empty vector, nothing unfindable. Now, let's make a scalar specification with three predicates that intentionally lack generators.

;; silly 'predicates' that lack generators

(def a? (fn [] 'a))
(def b? (fn [] 'b))
(def c? (fn [] 'c))


(unfindable-generators [a? b? c?]) ;; => [{:path [0], :value a?} ;; {:path [1], :value b?} ;; {:path [2], :value c?}]

unfindable-generators informs us that if we had tried to do a task that uses a sample generator, we'd have failed. With this knowledge, we could go back and add random sample generators to a?, b?, and c?.

Using Sample Generators

Speculoos can do three things with random sample generators.

  • Create a heterogeneous, arbitrarily-nested data structure when given a scalar specification.
  • Exercise a scalar specification.
  • Exercise a function with a scalar specification.

The first, creating a valid set of data from a given scalar specification, provides the foundation of the later two exercising functions, so we'll begin with data-from-spec.

Imagine we'd like to specify the scalars contained within a vector to be an integer, followed by a ratio, followed by a double-precision floating-point number. We've seen how to compose that scalar specification. Let's give that scalar specification to data-from-spec.

(require '[speculoos.utility :refer [data-from-spec]])


(data-from-spec [int? ratio? double?] :random) ;; => [-194 1/25 40.02458953857422]

That scalar specification contains three predicates, and each of those predicates targets a basic Clojure numeric type, so Speculoos automatically refers to test.check's generators to produce a random sample.

Let's try another example. The scalar specification will be a map with three keys associated with predicates for a character, a set-as-a-predicate, and a regex-predicate.

(data-from-spec {:x char?,
                 :y #{:red :green :blue},
                 :z #"fo{3,5}bar"}
                :random)
;; => {:x \K,
;;     :y :green,
;;     :z "foooobar"}

Again, without any further assistance, data-from-spec knew how to find or create a random sample generator for each predicate in the scalar specification. char? targets a basic Clojure type, so it generated a random character. Sets in a scalar specification, in this context, are considered a membership predicate. The random sample generator is merely a random selection of one of the members of set #{:red :green :blue}. Finally, Speculoos regards a regular expression as a predicate for validating strings. data-from-spec consults the re-rand library to generate a random string from regular expression #"foo{3,5}bar".

If our scalar specification contains custom predicates, we'll have to provide a little more information. We'll make another scalar specification containing a positive, even integer…

(defpred pos-even-int? (fn [i] (and (int? i) (pos? i) (even? i))))

…relying on defpred's predicate inspection machinery to infer a generator. After making our pos-even-int? predicate, we'll make a predicate satisfied by a three-character string, (fn [s] (and (string? s) (= 3 (count s)))). The generator which defpred would create for that predicate is kinda naive.

(inspect-fn '(fn [s] (and (string? s) (= 3 (count s)))))
;; => (gen/such-that (fn [s] (and (= 3 (count s)))) gen/string-alphanumeric)

;; (…output elided…)

That naive generator would produce random strings of random lengths until it found one exactly three characters long. It's possible it would fail to produce a valid value before hitting the max-tries limit. However, we can explicitly write a generator and attach it with defpred.

(defpred three-char-string?
         (fn [s] (and (string? s) (= 3 (count s))))
         #(clojure.string/join (gen/sample gen/char-alphanumeric 3)))

Now that we have two scalar predicates with custom sample generators — one created by defpred, one created by us — we'll bring them together into a single scalar specification and invoke data-from-spec.

(data-from-spec [pos-even-int? three-char-string?] :random) ;; => [26 "QT4"]

data-from-spec generates a valid data set whose randomly-generated scalars satisfy the scalar specification. In fact, we can feed the generated data back into the specification and it ought to validate true. We provide valid-scalars? with the generated data as the first argument (upper row) and the specification as the second argument (lower row).

(speculoos.core/valid-scalars? (data-from-spec [int? ratio? double?])
                               [int? ratio? double?])
;; => true

Perhaps it would be nice to do that multiple times, one immediately after another: generate some random data from a specification and feed it back into the specification to see if it validates. Don't go off and write your own utility. Speculoos can exercise a scalar specification.

(require '[speculoos.utility :refer [exercise]])


(exercise [int? ratio? double?]) ;; => ([[381 22/31 197.24923706054688] true] ;; [[554 -18/29 -91.66743993759155] true] ;; [[282 2/3 0.023276634514331818] true] ;; [[34 -3/14 6.911681652069092] true] ;; [[328 -13/6 ##Inf] true] ;; [[250 11/27 ##-Inf] true] ;; [[629 -7/18 -2.3257813453674316] true] ;; [[-705 16/13 51.83114242553711] true] ;; [[514 16/25 -3.40625] true] ;; [[526 27/29 0.234375] true])

Ten times, exercise generated a vector containing an integer, ratio, and double-precision numbers, then performed a scalar validation using those random samples as the data and the original scalar specification. In each of those ten runs, we see that exercise generated valid, true data.

So now we've seen that Speculoos can repeatedly generate random valid data from a scalar specification and run a validation of that random data. If we have injected an argument scalar specification into a function's metadata, Speculoos can repeatedly generate specification-satisfying arguments and repeatedly invoke that function. That activity would be considered exercising the function.

We revisit our friend, sum-three, a function which accepts three numbers and sums them. That scalar specification we've been using, [int? ratio? double?], mimics the shape of the argument sequence, so let's inject it into sum-three's metadata.

(defn sum-three [x y z] (+ x y z))


(inject-specs! sum-three {:speculoos/arg-scalar-spec [int? ratio? double?]}) ;; => nil

sum-three is certainly capable of summing any three numbers we feed it, but just for fun, we specify that the arguments ought to be an integer, a ratio, and a double-precision number. Now that we've defined our function and added an argument scalar specification, let's exercise sum-three.

(require '[speculoos.function-specs :refer [exercise-fn]])


(exercise-fn sum-three) ;; => ([[-454 23/5 2.7606201171875] -446.6393798828125] ;; [[386 -7/9 0.07042534183710814] 385.2926475640593] ;; [[323 17/9 0.17266845703125] 325.06155734592016] ;; [[-997 -3/2 -128.0] -1126.5] ;; [[-848 -7/22 7.080362558364868] -841.2378192598169] ;; [[-942 3/8 -1.3112716674804688] -942.9362716674805] ;; [[759 -12/7 0.0] 757.2857142857143] ;; [[-95 1/3 -3.4675408005714417] -98.13420746723811] ;; [[416 -22/21 49.521843910217285] 464.4742248625983] ;; [[-275 16/27 3.5546875] -270.8527199074074])

int?, ratio?, and double? all have built-in generators, so we didn't have to create any custom generators. exercise-fn extracted sum-three's argument scalar specification, then, ten times, generated a data set from random sample generators, then invoked the function with those arguments.

Canonical Samples

Sometimes it might be useful that a generated value be predictable. Perhaps we're writing documentation, or making a presentation, and we'd like the values to be aesthetically pleasing. Or, sometimes during development, it's nice to be able to quickly eyeball a known value.

Speculoos provides a canonical sample for many of Clojure's fundamental scalars when the relevant functions are invoked with the :canonical option. Here we use data-from-spec to illustrate the built-in canonical values of six of the basic scalars.

(data-from-spec {:x int?,
                 :y char?,
                 :z string?,
                 :w double?,
                 :q ratio?,
                 :v keyword?}
                :canonical)
;; => {:q 22/7,
;;     :v :kw,
;;     :w 1.0E32,
;;     :x 42,
;;     :y \c,
;;     :z "abc"}

The two exercising functions, exercise and exercise-fn both accept the :canonical option, as well.

(exercise [int? ratio? double?] :canonical) ;; => ([[42 22/7 1.0E32] true])


(exercise-fn sum-three :canonical) ;; => ([[42 22/7 1.0E32] 1.0E32])

Since the canonical values don't vary, it doesn't make much sense to exercise more than once.

Beyond the built-in canonical values, we can supply canonical values of our own choosing when we define a predicate. We can manually add the canonical values via with-meta or we can add a canonical value using defpred as an argument following a custom generator.

;; won't bother to write a proper generator; use `(constantly :ignored)` as a placeholder

(defpred neg-odd-int?   (fn [i] (and (int? i) (neg? i) (odd? i)))   (constantly :ignored)   -33)


(defpred happy-string?   (fn [s] (string? s))   (constantly :ignored)   "Hello Clojure!")


(defpred pretty-number? (fn [n] (number? n)) (constantly :ignored) 123.456)



(data-from-spec [neg-odd-int? happy-string? pretty-number?] :canonical) ;; => [-33 "Hello Clojure!" 123.456]

We see that data-from-spec found the custom canonical values for each of the three predicates: -33 for neg-odd-int?, "Hello Clojure!" for happy-string?, and 123.456 for pretty-number?. Notice that exercising a function does not validate the arguments or returns. Function argument and return validation only occurs when we explicitly invoke validate-fn-with, validate-fn, or we intentionally instrument it.

Utility Functions

You won't miss any crucial piece of Speculoos' functionality if you don't use this namespace, but perhaps something here might make your day a little nicer. Nearly every function takes advantage of speculoos.core/all-paths, which decomposes a heterogeneous, arbitrarily-nested data structure into a sequence of paths and datums. With that in hand, these not-clever functions churn through the entries and give us back something useful.

(require '[speculoos.utility :refer
           [scalars-without-predicates predicates-without-scalars
            collections-without-predicates predicates-without-collections
            sore-thumb spec-from-data data-from-spec
            basic-collection-spec-from-data]])

Recall that Speculoos only validates using elements in the data and predicates in the specification located at identical paths. This next duo of utilities tells us where we have unmatched scalars or unmatched predicates. The first of the duo tells us about un-paired scalars.

(scalars-without-predicates [42 ["abc" 22/7]]
                            [int?])
;; => #{{:path [1 0], :value "abc"}
;;      {:path [1 1], :value 22/7}}

With this information, we can see if the specification was ignoring scalars that we were expecting to validate, and adjust our specification for better coverage. (The thoroughly-… group of functions would strictly enforce all datums be paired with predicates.)

The second utility of that duo performs the complementary operation by telling us about un-paired predicates.

(predicates-without-scalars [42]
                            [int? string? ratio?])
;; => ({:path [1], :value string?}
;;     {:path [2], :value ratio?})

It is especially helpful for diagnosing surprising results. Just because we put a predicate into the scalar specification doesn't force validation of a scalar that doesn't exist.

(predicates-without-scalars [42 "abc"]
                            [int? [string? ratio?]])
;; => ({:path [1 0], :value string?}
;;     {:path [1 1], :value ratio?})

Now we can see two un-paired predicates. ratio? simply doesn't have a scalar to pair with, and string? doesn't share a path with "abc" so it wasn't used during validation.

It's not difficult to neglect a predicate for a nested element within a collection specification, so Speculoos offers analogous utilities to highlight those possible issues.

(collections-without-predicates [11 [22 {:a 33}]]
                                [vector? [{:is-a-map? map?}]])
;; => #{{:path [1], :value [22 {:a 33}]}}

Yup, we didn't specify that inner vector whose first element is 22. That's okay, though. Maybe we don't care to specify it. But at least, we're aware, now.

Maybe we put a predicate into a collection specification that clearly ought to be unsatisfied, but for some reason, validate-collections isn't picking it up.

(predicates-without-collections {:a 42}
                                {:is-map? map?, :b [set?]})
;; => #{{:path [:b 0], :value set?}}

Aha. set? in the collection specification isn't paired with an element in the data, so it is unused during validation.

Taking those ideas further, the thorough validation variants return true only if every scalar and every collection in data have a corresponding predicate in the scalar specification and the collection specification, respectively, and all those predicates are satisfied.

This next utility is probably only useful during development. Given data and a scalar specification, sore-thumb prints back both, but with only the invalid scalars and predicates showing.

#'speculoos-project-readme-generator/sore-thumb-example#'speculoos-project-readme-generator/sore-thumb-example-eval
(sore-thumb [42 {:a true, :b [22/7 :foo]} 1.23]
            [int? {:a boolean?, :b [ratio? string?]} int?])


;; to *out*

data: [_ {:a _, :b [_ :foo]} 1.23] spec: [_ {:a _, :b [_ string?]} int?]

I've found it handy for quickly pin-pointing the unsatisfied scalar-predicate pairs in a large, deeply-nested data structure.

I think of the next few utilities as creative, making something that didn't previously exist. We'll start with a pair of functions which perform complimentary actions.

(spec-from-data [33 {:a :baz, :b [1/3 false]} '(3.14 \z)])
;; => [int?
;;     {:a keyword?,
;;      :b [ratio? boolean?]}
;;     (double? char?)]


(data-from-spec   {:x int?, :y [ratio? boolean?], :z (list char? neg-int?)}   :random) ;; => {:x 547, ;; :y [-17/24 true], ;; :z (\4 -9)}

I hope their names give good indications of what they do. The generated specification contains only basic predicates, that is, merely Is it an integer?, not Is it an even integer greater than 25, divisible by 3?. But it's convenient raw material to start crafting a tighter specification. (Oh, yeah…they both round-trip.) A few paragraphs down we'll see some ways to create random sample generators for compound predicates.

Speaking of raw material, Speculoos also has a collection specification generator.

(basic-collection-spec-from-data [55 {:q 33, :r ['foo 'bar]} '(22 44 66)])
;; => [{:r [vector?], :speculoos.utility/collection-predicate map?} (list?) vector?]

Which produces a specification that is perhaps not immediately useful, but does provide a good starting template, because collection specifications can be tricky to get just right.

The utility namespace contains a trio of functions to assist writing, checking, and locating compound predicates that can be used by data-from-spec, validate-fn, and validate-fn-with to generate valid random sample data. A compound predicate such as #(and (int? %) (< % 100)) does not have built-in generator provided by clojure.test.check.generators. However, data-from-spec and friends can extract a generator residing in the predicate's metadata. The defpred utility streamlines that task.

Predicates

A predicate function returns a truthy or falsey value.

(#(<= 5 %) 3) ;; => false


(#(= 3 (count %)) [1 2 3]) ;; => true

Non-boolean returns work, too. For example, sets make wonderful membership tests.

;; truthy
(#{:blue :green :orange :purple :red :yellow} :green) ;; => :green

;; falsey
(#{:blue :green :orange :purple :red :yellow} :swim) ;; => nil

Regular expressions come in handy for validating string contents.

;; truthy
(re-find #"^Four" "Four score and seven years ago...") ;; => "Four"

;; falsey
(re-find #"^Four" "When in the course of human events...") ;; => nil

Invoking a predicate when supplied with a datum — scalar or collection — is the core action of Speculoos' validation.

(int? 42) ;; => true


(validate-scalars [42]   [int?]) ;; => [{:datum 42, ;; :path [0], ;; :predicate int?, ;; :valid? true}]

Speculoos is fairly ambivalent about the predicate return value. The validate… family of functions mindlessly churns through its sequence of predicate-datum pairs, evaluates them, and stuffs the results into :valid? keys. The valid…? family of functions rips through that sequence, and if none of the results are falsey, returns true, otherwise it returns false.

For most of this document, we've been using the built-in predicates offered by clojure.core such as int? and vector? because they're short, understandable, and they render clearly. But in practice, it's not terribly useful to validate an element with a mere Is this scalar an integer? or Is this collection a vector? Often, we'll want to combine multiple predicates to make the validation more specific. We could certainly use clojure.core/and

#(and (int? %) (pos? %) (even? %))

…and clojure.core/or

#(or (string? %) (char? %))

…which have the benefit of being universally understood. But Clojure also provides a pair of nice functions that streamline the expression and convey our intention. every-pred composes an arbitrary number of predicates with and semantics.

((every-pred number? pos? even?) 100) ;; => true

Similarly, some-fn composes predicates with or semantics.

((some-fn number? string? boolean?) \z) ;; => false

When Speculoos validates the scalars of a sequence, it consumes each element in turn. If we care only about validating some of the elements, we must include placeholders in the specification to maintain the sequence of predicates.

For example, suppose we only want to validate \z, the third element of [42 :foo \z]. The first two elements are irrelevant to us. We have a few options. We could write our own little always-true predicate. #(true) won't work because true is not invocable. #(identity true) loses the conciseness. This works…

(fn [] true)

…but Clojure already includes a couple of nice options.

(valid-scalars? [42 :foo \z]
                [(constantly true) (constantly true) char?])
;; => true

constantly is nice because it accepts any number of args. But for my money, nothing tops any?.

(valid-scalars? [42 :foo \z]
                [any? any? char?]) ;; => true

any? is four characters, doesn't require typing parentheses, and the everyday usage of any aligns well with its technical purpose.

A word of warning about clojure.core/contains?. It might seem natural to use contains? to check if a collection contains an item, but it doesn't do what its name suggests. Observe.

(contains? [97 98 99] 1) ;; => true

contains? actually tells us whether a collection contains a key. For a vector, it tests for an index. If we'd like to check whether a value is contained in a collection, we can use this pattern.

(defn in? [coll item] (some #(= item %) coll))


;; integer 98 is a value found in the vector

(in? [97 98 99] 98) ;; => true


;; integer 1 is not a value found in the vector

(in? [97 98 99] 1) ;; => false

(Check out speculoos.utility/in?.)

I've been using the #(…) form because it's compact, but it does have a drawback when Speculoos renders the function in a validation report.

[{:path [0],
  :datum 42,
  :predicate #function[documentation/eval94717/fn--94718],
  :valid? false}]

The function rendering is not terribly informative when the validation displays the predicate. Same problem with (fn [v] (…)).

One solution to this issue is to define our predicates with an informative name.

(def greater-than-50? #(< 50 %))


(validate-scalars [42]   [greater-than-50?]) ;; => [{:datum 42, ;; :path [0], ;; :predicate greater-than-50?, ;; :valid? false}]

Now, the predicate entry carries a bit more meaning.

Regular expressions check the content of strings.

(def re #"F\dQ\d")


(defn re-pred [s] (re-matches re s))


(validate-scalars ["F1Q5" "F2QQ"]   [re-pred re-pred]) ;; => [{:datum "F1Q5", :path [0], :predicate re-pred, :valid? "F1Q5"} ;; {:datum "F2QQ", :path [1], :predicate re-pred, :valid? nil}]

Speculoos considers regexes in a scalar specification as predicates, so we can simply jam them in there.

(valid-scalars? ["A1B2" "CDEF"]
                [#"(\w\d){2}" #"\w{4}"])
;; => true


(validate-scalars {:a "foo", :b "bar"}   {:a #"f.\w", :b #"^[abr]{0,3}$"}) ;; => [{:datum "foo", ;; :path [:a], ;; :predicate #"f.\w", ;; :valid? "foo"} ;; {:datum "bar", ;; :path [:b], ;; :predicate #"^[abr]{0,3}$", ;; :valid? "bar"}]

Using bare regexes in our scalar specification has a nice side benefit in that the data-from-spec, exercise, and exercise-fn utilities can inspect the regex and automatically generate valid strings.

Beyond their critical role they play in validating data, predicate functions can also carry metadata that describes how to generate valid, random samples. To help with that task, the utility namespace provides defpred, a helper macro that streamlines defing predicates and associating random sample generators.

Instead of storing specifications in a dedicated registry, Speculoos takes a laissez-faire approach: specifications may live directly in whatever namespace we please. If we feel that some sort of registry would be useful, we could make our own modeled after spec.alpha's.

Finally, when checking function correctness, validating the relationship between the function's arguments and the function's return value uses a function that kinda looks like a predicate. In contrast to a typical predicate that accepts one argument, that relationship-checking function accepts exactly two elements: the function's argument sequence and the function's return value.

Non-terminating sequences

Speculoos absorbs lots of power from Clojure's infinite, lazy sequences. That power stems from the fact that Speculoos only validates complete pairs of datums and predicates. Datums without predicates are not validated, and predicates without datums are ignored. That policy provides optionality in our data. If a datum is present, it is validated against its corresponding predicate, but if that datum is non-existent, it is not required.

In the following examples, the first argument in the upper row is the data, the second argument in the lower row is the specification.

;; un-paired scalar predicates

(validate-scalars [42]   [int? keyword? char?]) ;; => [{:datum 42, ;; :path [0], ;; :predicate int?, ;; :valid? true}]


;; un-paired scalar datums

(validate-scalars [42 :foo \z]   [int?]) ;; => [{:datum 42, ;; :path [0], ;; :predicate int?, ;; :valid? true}]

We remember Motto #3: Ignore un-paired predicates and un-paired datums. In the first example, only the single integer 42 is validated because it was paired with predicate int?; the remaining two predicates, keyword? and char?, are un-paired, and therefore ignored. In the second example, only int? participated in validation because it was the only predicate that pairs with a scalar. Scalars :foo and \z were not paired with a predicate, and were therefore ignored. The fact that the specification vector is shorter than the data implies that any trailing, un-paired data elements are un-specified. We can take advantage of this fact by intentionally making either the data or the specification run off the end.

First, if we'd like to validate a non-terminating sequence, specify as many datums as necessary to capture the pattern. repeat produces multiple instances of a single value, so we only need to specify one datum.

(validate-scalars (repeat 3)
                  [int?])
;; => [{:datum 3,
;;      :path [0],
;;      :predicate int?,
;;      :valid? true}]

Despite (repeat 3) producing a non-terminating sequence of integers, only the first integer was validated because that's the only predicate supplied by the specification.

cycle can produce different values, so we ought to test for as many as appear in the definition.

(validate-scalars (cycle [42 :foo 22/7])
                  [int? keyword? ratio?])
;; => [{:datum 42,
;;      :path [0],
;;      :predicate int?,
;;      :valid? true}
;;     {:datum :foo,
;;      :path [1],
;;      :predicate keyword?,
;;      :valid? true}
;;     {:datum 22/7,
;;      :path [2],
;;      :predicate ratio?,
;;      :valid? true}]

Three unique datums. Only three predicates needed.

On the other side of the coin, non-terminating sequences serve a critical role in composing Speculoos specifications. They express I don't know how many items there are in this sequence, but they all must satisfy these predicates.

(valid-scalars? [1] (repeat int?)) ;; => true
(valid-scalars? [1 2] (repeat int?)) ;; => true
(valid-scalars? [1 2 3] (repeat int?)) ;; => true
(valid-scalars? [1 2 3 4] (repeat int?)) ;; => true
(valid-scalars? [1 2 3 4 5] (repeat int?)) ;; => true

Basically, this idiom serves the role of a regular expression zero-or-more. Let's pretend we'd like to validate an integer, then a string, followed by any number of characters. We compose our specification like this.

;; use `concat` to append an infinite sequence of `char?`

(validate-scalars [99 "abc" \x \y \z]   (concat [int? string?] (repeat char?))) ;; => [{:datum 99, ;; :path [0], ;; :predicate int?, ;; :valid? true} ;; {:datum "abc", ;; :path [1], ;; :predicate string?, ;; :valid? true} ;; {:datum \x, ;; :path [2], ;; :predicate char?, ;; :valid? true} ;; {:datum \y, ;; :path [3], ;; :predicate char?, ;; :valid? true} ;; {:datum \z, ;; :path [4], ;; :predicate char?, ;; :valid? true}]


(require '[speculoos.core :refer [only-invalid]])


;; string "y" will not satisfy scalar predicate `char?`; use `only-valid` to highlight invalid element

(only-invalid (validate-scalars [99 "abc" \x "y" \z]   (concat [int? string?] (repeat char?)))) ;; => ({:datum "y", ;; :path [3], ;; :predicate char?, ;; :valid? false})

Or perhaps we'd like to validate a function's argument list composed of a ratio followed by &-args consisting of any number of alternating keyword-string pairs.

;; zero &-args

(valid-scalars? [2/3]   (concat [ratio?] (cycle [keyword string?]))) ;; => true


;; two pairs of keyword+string optional args

(valid-scalars? [2/3 :opt1 "abc" :opt2 "xyz"]   (concat [ratio?] (cycle [keyword string?]))) ;; => true


;; one pair of optional args; 'foo does not satisfy `string?` scalar predicate

(only-invalid (validate-scalars [2/3 :opt1 'foo]   (concat [ratio?] (cycle [keyword string?])))) ;; => ({:datum foo, ;; :path [2], ;; :predicate string?, ;; :valid? false})

Using non-terminating sequences this way sorta replicates spec.alpha's sequence regexes. I think of it as Speculoos' super-power.

Also, Speculoos can handle nested, non-terminating sequences.

(valid-scalars? [[1] [2 "2"] [3 "3" :3]]
                (repeat (cycle [int? string? keyword?])))
;; => true

This specification is satisfied with a Possibly infinite sequence of arbitrary-length vectors, each vector containing a pattern of an integer, then a string, followed by a keyword.

One detail that affects usage: A non-terminating sequence must not appear at the same path within both the data and specification. I am not aware of any method to inspect a sequence to determine if it is infinite, so Speculoos will refuse to validate a non-terminating data sequence at the same path as a non-terminating predicate sequence, and vice versa. However, feel free to use them in either data or in the specification, as long as they live at different paths.

;; data's infinite sequence at :a, specification's infinite sequence at :b

(valid-scalars? {:a (repeat 42), :b [22/7 true]}   {:a [int?], :b (cycle [ratio? boolean?])}) ;; => true


;; demo of some invalid scalars

(only-invalid (validate-scalars {:a (repeat 42), :b [22/7 true]}   {:a [int? int? string?], :b (repeat ratio?)})) ;; => ({:datum 42, ;; :path [:a 2], ;; :predicate string?, ;; :valid? false} ;; {:datum true, ;; :path [:b 1], ;; :predicate ratio?, ;; :valid? false})

In both cases above, the data contains a non-terminating sequence at key :a, while the specification contains a non-terminating sequence at key :b. Since in both cases, the two infinite sequences do not share a path, validation can proceed to completion.

So what's going on? Internally, Speculoos finds all the potentially non-terminating sequences in both the data and the specification. For each of those hits, Speculoos looks into the other nested structure to determine how long the counterpart sequence is. Speculoos then clamps the non-terminating sequence to that length. Validation proceeds with the clamped sequences. Let's see the clamping in action.

(require '[speculoos.core :refer [expand-and-clamp-1]])


(expand-and-clamp-1 (range) [int? int? int?]) ;; => [0 1 2]

range would have continued merrily on forever, but the clamp truncated it at three elements, the length of the second argument vector. That's why two non-terminating sequences at the same path are not permitted. Speculoos has no way of knowing how short or long the sequences ought to be, so instead of making a bad guess, it throws the issue back to us. The way we indicate how long it should be is by making the counterpart sequence a specific length. Where should Speculoos clamp that (range) in the above example? The answer is the length of the other sequential thing, [int? int? int?], or three elements.

Speculoos' utility namespace provides a clamp-in* tool for us to clamp any sequence within a homogeneous, arbitrarily-nested data structure. We invoke it with a pattern of arguments similar to clojure.core/assoc-in.

(require '[speculoos.utility :refer [clamp-in*]])


(clamp-in* {:a 42, :b ['foo 22/7 {:c (cycle [3 2 1])}]}   [:b 2 :c]   5) ;; => {:a 42, :b [foo 22/7 {:c [3 2 1 3 2]}]}

clamp-in* used the path [:b 2 :c] to locate the non-terminating cycle sequence, clamped it to 5 elements, and returned the new data structure with that terminating sequence, converted to a vector. This way, if Speculoos squawks at us for having two non-terminating sequences at the same path, we have a way to clamp the data, specification, or both at any path, and validation can proceed.

Be sure to set your development environment's printing length

(set! *print-length* 99) ;; => 99

or you may jam up your session.

Sets

Sets are…a handful. They perform certain tasks with elegance that ought not be dismissed, but using sets present some unique challenges compared to the other Clojure collections. The elements in a set are addressed by their identities. What does that even mean? Let's compare to Clojure's other collections to get some context.

The elements of a sequence are addressed by monotonically increasing integer indexes. Give a vector index 2 and it'll give us back the third element, if it exists.

([11 22 33] 2) ;; => 33

The elements of a map are addressed by its keys. Give a map a key :howdy and it'll give us back the value at that key, if it exists.

({:hey "salut", :howdy "bonjour"} :howdy) ;; => "bonjour"

Give a set some value, and it will give us back that value…

(#{:index :middle :pinky :ring :thumb} :thumb) ;; => :thumb

…but only if that element exists in the set.

(#{:index :middle :pinky :ring :thumb} :bird) ;; => nil

So the paths to elements of vectors, lists, and maps are composed of indexes or keys. The paths to members of a set are the thing themselves. Let's take a look at a couple of examples.

We use all-paths to enumerate the paths of elements contained in a Clojure data collection.

(all-paths #{:foo 42 "abc"})
;; => [{:path [], :value #{42 :foo "abc"}}
;;     {:path ["abc"], :value "abc"}
;;     {:path [:foo], :value :foo}
;;     {:path [42], :value 42}]

In this first example, the root element, a set, has a path []. The remaining three elements, direct descendants of the root set have paths that consist of themselves. We find 42 at path [42] and so on.

The second example applies the principle further. This set contains one integer and one set-nested-in-a-vector-nested-in-a-map.

(all-paths #{11 {:a [22 #{33}]}})
;; => [{:path [], :value #{11 {:a [22 #{33}]}}}
;;     {:path [{:a [22 #{33}]}], :value {:a [22 #{33}]}}
;;     {:path [{:a [22 #{33}]} :a], :value [22 #{33}]}
;;     {:path [{:a [22 #{33}]} :a 0], :value 22}
;;     {:path [{:a [22 #{33}]} :a 1], :value #{33}}
;;     {:path [{:a [22 #{33}]} :a 1 33], :value 33}
;;     {:path [11], :value 11}]

As an exercise, we'll walk through how we'd navigate to that 33. Let's borrow a function from the fn-in project to zoom in on what's going on. The first argument (upper row) is our example set. The second argument (lower row) is a path. We'll build up the path to 33 piece by piece.

To start, we'll get the root.

(require '[fn-in.core :refer [get-in*]])


(get-in* #{11 {:a [22 #{33}]}}   [])
;; => #{{:a [22 #{33}]} 11}

As with any collection type, the root element has a path []. Supplying get-in* with a path [] retrieves the entire collection.

There are two direct descendants of the root set: scalar 11 and a map. We've already seen that the integer's path is the value of the integer.

(get-in* #{11 {:a [22 #{33}]}} [11]) ;; => 11

The path to the map is the map itself, which appears as the first element of its path. Combining the root path [] with the value of the map {:a [22 #{33}]} results in this path to the map.

[{:a [22 #{33}]}]]

Since we're often dealing with maps and sequentials, indexed by keywords and integers, that path may look unusual. But Speculoos handles goofy paths without skipping a beat.

(get-in* #{11 {:a [22 #{33}]}}
         [{:a [22 #{33}]}]) ;; => {:a [22 #{33}]}

When supplied with that path, get-in* extracts the map contained in the set.

The map has one MapEntry, key :a, with an associated value, a two-element vector [22 #{33}]. A map value is addressed by its key, so the vector's path contains that key. Its path is that of its parent, with its :a key appended.

(get-in* #{11 {:a [22 #{33}]}}
         [{:a [22 #{33}]} :a]) ;; => [22 #{33}]

Paths into a vector are old hat by now. Our 33 is contained in a set, located at the second position, index 1 in zero-based land, which we append to the accumulating path.

(get-in* #{11 {:a [22 #{33}]}}
         [{:a [22 #{33}]} :a 1]) ;; => #{33}

We've now arrived at the little nested set which holds our 33. Items in a set are addressed by their identity, and the identity of 33 is 33. So we append that to the path so far.

(get-in* #{11 {:a [22 #{33}]}}
         [{:a [22 #{33}]} :a 1 33]) ;; => 33

And now we've finally fished out our 33. Following this algorithm, we can get, change, and delete any element of any heterogeneous, arbitrarily-nested data structure, and that includes sets at any level of nesting. We could even make a path to a set, nested within a set, nested within a set.

When using Speculoos, we encounter sets in three scenarios. We'll briefly sketch the three scenarios, then later go into the details.

  1. Scalar validation, scalar in data, set in specification.

    In this scenario, we're validating scalars, so we're using a function with scalar in its name.

    (validate-scalars [42 :red]
                      [int? #{:red :green :blue}])

    In the example above, we're testing a property of a scalar, keyword :red, the second element of the data (first argument, upper row). The set #{:red :green :blue} in the specification (lower row) is a predicate-like thing that tests membership.

  2. Scalar validation, set in data, set in specification.

    In this scenario, we're validating scalars, so we're using a scalar validation function, again validate-scalars.

    (validate-scalars [42 #{:chocolate :vanilla :strawberry}]
                      [int? #{keyword?}])

    This time, we're validating scalars contained within a set in the data (upper row), with scalar predicates contained within a set in the specification (lower row).

  3. Collection validation, set in data, set in specification.

    In this scenario, we're validating a property of a collection, so we're using validate-collections.

    (validate-collections [42 #{:puppy :kitten :goldfish}]
                          [vector? #{set?}])

    Collection predicates — targeting the nested set in the data (upper row) — are themselves contained in a set nested in the collection specification (lower row).

1. Set as Scalar Predicate

Let's remember back to the beginning of this section where we saw that Clojure sets can serve as membership tests. Speculoos can therefore use sets as a nice shorthand for a membership predicate.

(def color? #{:red :green :blue})

(ifn? color?) ;; => true


(color? :red) ;; => :red

(color? :plaid) ;; => nil

color? implements IFn and thus behaves like a predicate when invoked as a function. :red satisfies our color? predicate and returns a truthy value, :red, whereas :plaid does not and returns a falsey value, nil.

During scalar validation, when a scalar in our data shares a path with a set in the specification, Speculoos enters set-as-a-predicate mode. I say 'mode' only in the casual sense. The implementation uses no modes nor state. The algorithm merely branches to treat the set differently depending on the scenario.

We'll make our specification mimic the shape of our data (Motto #2), but instead of two predicate functions pairing with two scalars (Motto #3), we'll insert one scalar predicate function, followed by a set, which behaves like a membership predicate.

;; data

(all-paths [42 :red]) ;; => [{:path [], :value [42 :red]} ;; {:path [0], :value 42} ;; {:path [1], :value :red}]


;; scalar specification

(all-paths [int? #{:red :green :blue}]) ;; => [{:path [], :value [int? #{:blue :green :red}]} ;; {:path [0], :value int?} ;; {:path [1], :value #{:blue :green :red}} ;; {:path [1 :green], :value :green} ;; {:path [1 :red], :value :red} ;; {:path [1 :blue], :value :blue}]

Our example data contains two scalar datums: 42 in the first spot and :red in the second. Each of those datums shares a path with a predicate-like thing in the scalar specification. The 42 is paired with the int? scalar predicate because they both share the path [0]. Both :red and #{:red :green :blue} share a path [1], so Speculoos regards it as a set-as-a-scalar-predicate.

Let's run that validation now. The data vector is the first argument in the upper row, the specification vector is the second argument in the lower row.

(validate-scalars [42 :red]
                  [int? #{:red :green :blue}])
;; => [{:datum :red,
;;      :path [1],
;;      :predicate #{:blue :green :red},
;;      :valid? :red}
;;     {:datum 42,
;;      :path [0],
;;      :predicate int?,
;;      :valid? true}]

When Speculoos validates scalars, it treats the set in the specification as a predicate because the corresponding element in the data is a scalar, not a set. In this example, :red is a member of the #{:red :green :blue} set-predicate.

The same principles hold when validating elements of a map containing a set-predicate. When a set in the specification contains a set that shares a path with a scalar in the data, that set is treated as a membership predicate.

(validate-scalars {:x 42, :y :red}
                  {:x int?, :y #{:red :green :blue}})
;; => [{:datum :red,
;;      :path [:y],
;;      :predicate #{:blue :green :red},
;;      :valid? :red}
;;     {:datum 42,
;;      :path [:x],
;;      :predicate int?,
;;      :valid? true}]

Scalar 42 pairs with predicate int? at path [:x] and scalar :red pairs with set-predicate #{:red :green :blue} at path [:y]. Speculoos validates scalars in the data that share paths with predicates in the specification. Since #{:red :green :blue} is considered a predicate, scalar :red is validated.

2. Validate Scalars within Set

Sometimes the scalars in our data are contained in a set. Speculoos can validate scalars within a set during a scalar validation operation. Validating a set's scalar members follows all the same principles as validating a vector's scalar members, except for one wrinkle: Since elements of a set have no inherent location, i.e., sets are unordered, sets in our data are validated against all predicates contained in the corresponding set at the same path in the specification. An example shows this better than words.

Let's enumerate the paths of some example data, with some scalars contained in a nested set, and then enumerate the paths of a specification, shaped like that data, with one predicate contained in a nested set.

;; data, some scalars are contained within a set

(all-paths [42 #{:chocolate :vanilla :strawberry}]) ;; => [{:path [], :value [42 #{:chocolate :strawberry :vanilla}]} ;; {:path [0], :value 42} ;; {:path [1], :value #{:chocolate :strawberry :vanilla}} ;; {:path [1 :strawberry], :value :strawberry} ;; {:path [1 :chocolate], :value :chocolate} ;; {:path [1 :vanilla], :value :vanilla}]


;; scalar specification, one predicate contained within a set

(all-paths [int? #{keyword?}]) ;; => [{:path [], :value [int? #{keyword?}]} ;; {:path [0], :value int?} ;; {:path [1], :value #{keyword?}} ;; {:path [1 keyword?], :value keyword?}]

Let's apply the Mottos. We intend to validate scalars, so we'll use validate-scalars, which only applies predicates to scalars. Next, we'll make our our specification mimic the shape of the data. In this example, both the data and the specification are a vector, with something in the first spot, and a set in the second spot. Finally, we'll make sure that all predicates are paired with a scalar.

Now we validate the scalars.

(validate-scalars [42 #{:glass :rubber :paper}]
                  [int? #{keyword?}])
;; => ({:datum 42,
;;      :path [0],
;;      :predicate int?,
;;      :valid? true}
;;     {:datums-set #{:glass :paper :rubber},
;;      :path [1],
;;      :predicate keyword?,
;;      :valid? true})

First, notice how the scalar specification (lower row) looks a lot like the data (upper row). Because the shapes are similar, validate-scalars is able to systematically apply predicates from the specification to scalars in the data. Speculoos validates 42 against predicate int? because they share a path, [0], in their respective vectors. Path [1] navigates to a set in both our data vector and our specification vector, so Speculoos enters validate-scalars-within-a-set-mode.

Every predicate contained in the specification set is applied to every datum in the data's set. In this example, keyword? is individually applied to :glass, :rubber, and :paper, and since each satisfies the predicate, the validation returns true.

One of the defining features of Clojure sets is that they're amorphous bags of items, without any inherent ordering. Within the context of a set, it doesn't make sense to target one scalar predicate towards one particular scalar datum. Therefore, Speculoos validates scalars contained within a set more broadly. If our specification set contains more than one predicate, each of the predicates is applied to all the scalars in the data's set.

In the next example, the specification set contains two predicates, keyword? an qualified-keyword?.

(validate-scalars #{:chocolate}
                  #{keyword? qualified-keyword?})
;; => ({:datums-set #{:chocolate},
;;      :path [],
;;      :predicate qualified-keyword?,
;;      :valid? false}
;;     {:datums-set #{:chocolate},
;;      :path [],
;;      :predicate keyword?,
;;      :valid? true})

Two scalar predicates in the specification applied to the one scalar datum. :chocolate is a keyword, but not a qualified keyword.

Next, we'll see how to validate multiple scalars with multiple scalar predicates. The set in the data contains three scalars. The set in the specification contains two predicates, same as the previous example.

(validate-scalars #{:chocolate :vanilla :strawberry}
                  #{keyword? qualified-keyword?})
;; => ({:datums-set #{:chocolate :strawberry :vanilla},
;;      :path [],
;;      :predicate qualified-keyword?,
;;      :valid? false}
;;     {:datums-set #{:chocolate :strawberry :vanilla},
;;      :path [],
;;      :predicate keyword?,
;;      :valid? true})

Validation applies keyword? and simple-keyword?, in turn, to every scalar member of the data set. Speculoos tells us that all the scalars in the data are indeed keywords, but at least one of the data's scalars is not a qualified keyword. Notice how Speculoos condenses the validation results. Instead of a validation entry for each individual scalar in the data set, Speculoos combines all the results for all the scalars, associated to key :datums-set. Two scalar predicates, two validation results.

Again, the same principles apply for validating sets contained in a map.

(validate-scalars {:x 42, :y #{"a" "b" "c"}}
                  {:x int?, :y #{string?}})
;; => ({:datum 42,
;;      :path [:x],
;;      :predicate int?,
;;      :valid? true}
;;     {:datums-set #{"a" "b" "c"},
;;      :path [:y],
;;      :predicate string?,
;;      :valid? true})

int? at :x applies to 42 also at :x. Then, string? at :y is applied to scalars "a", "b", and "c" at :y.

Speculoos performs the two modes in separate passes, so we may even use both set-as-a-predicate-mode and validate-scalars-within-a-set-mode during the same validation, as long as the predicates stay on their own side of the fence.

(validate-scalars [42 #{:foo :bar :baz}]
                  [#{40 41 42} #{keyword?}])
;; => ({:datum 42,
;;      :path [0],
;;      :predicate #{40 41 42},
;;      :valid? 42}
;;     {:datums-set #{:bar :baz :foo},
;;      :path [1],
;;      :predicate keyword?,
;;      :valid? true})

In this example, the predicate #{40 41 42} at index 0 of the specification is a set while the datum at same index of the data is 42, a scalar. Speculoos uses the set-as-a-predicate mode. Since 42 is a member of #{40 41 42}, that datum validates as truthy. Because the data at index 1 is itself a set, Speculoos performs set-scalar-validation. The keyword? predicate is applied to each element of #{:foo :bar :baz} at index 1 and they all validate true.

3. Validate Set as a Collection

Let's discuss how collection validation works when a set is involved. During a collection validation operation, Speculoos will ignore all scalars in the data. It will only apply predicates to collections. The rules are identical to how the other collections are validated: predicates from the specification are applied to the corresponding parent container in the data. But let's not get bogged down in a textual description; let's look at some examples.

First, we'll start with some data that consists of a vector containing an integer, followed by a three element set. Let's generate all the paths.

(all-paths [42 #{:puppy :kitten :goldfish}])
;; => [{:path [], :value [42 #{:goldfish :kitten :puppy}]}
;;     {:path [0], :value 42}
;;     {:path [1], :value #{:goldfish :kitten :puppy}}
;;     {:path [1 :puppy], :value :puppy}
;;     {:path [1 :goldfish], :value :goldfish}
;;     {:path [1 :kitten], :value :kitten}]

Motto #1: Collection validation ignores scalars, so out of all those elements, validation will only consider the root at path [] and the nested set at path [1].

A good strategy for creating a collection specification is to copy-paste the data and delete all the scalars…

[        #{    }]

…and insert some collection predicates near the opening bracket.

[vector? #{set?}]

Let's generate the paths for that collection specification.

(all-paths [vector? #{set?}])
;; => [{:path [], :value [vector? #{set?}]}
;;     {:path [0], :value vector?}
;;     {:path [1], :value #{set?}}
;;     {:path [1 set?], :value set?}]

Notice the paths to the two predicates. Predicate vector is located at path [0], while predicate set? is located at path [1 set?]. When validating collections, Speculoos only considers predicates within a specification.

Now, let's run a collection validation.

(validate-collections [42 #{:puppy :kitten :goldfish}]
                      [vector? #{set?}])
;; => ({:datum [42 #{:goldfish :kitten :puppy}],
;;      :ordinal-path-datum [],
;;      :path-datum [],
;;      :path-predicate [0],
;;      :predicate vector?,
;;      :valid? true}
;;     {:datum #{:goldfish :kitten :puppy},
;;      :ordinal-path-datum [0],
;;      :path-datum [1],
;;      :path-predicate [1 set?],
;;      :predicate set?,
;;      :valid? true})

validate-collections was able to pair two collections in the data with two predicates in the specification, and we received two validation results. Collection predicate vector? at path [0] in the specification was applied to whatever is at path (drop-last [0]) in the data, which happens to be the root collection. Collection predicate set? at path [1 set?] in the specification was applied to path (drop-last [1 set?]) in the data, which happens to be our nested set containing pet keywords. Both predicates were satisfied.

Remember: Scalar predicates apply to the scalar at their exact location. Collection predicates apply to the collection directly above their head.

Troubleshooting

If you see surprising results, try these ideas.

  • Remember the Three Mottos, and follow them.

    1. Validate scalars separately from validating collections.

      We should never have a collection predicate like vector? in a scalar specification. Similarly, scalar predicates like int? should only appear in a collection specification in the context of testing a collection, like…

      (defn all-ints? [v] (every? #(int? %) v))

      …or when validating some relationship between datums, like this.

      (defn b-greater-than-a? [m] (< (m :a) (m :b)))

      The function names validate-scalars, validate-collections, et. al., are strong beacons to remind you that you're either validating scalars, or validating collections.

    2. Make the specification mimic the shape of the data.

      The Speculoos functions don't enforce any requirements on the data and specification. If we feed it data that's a map and a specification that's a vector, it will dutifully try to validate what it has.

      (validate-scalars {:a 99}
                        [int?]) ;; => []

      ;; No error nor exception with map data and vector specification

      validate-scalars was not able to pair any predicates with datums, so it returns an empty vector.

      One word of warning: Because sequential things are indexed by integers, and map elements may also be indexed by integers, we could certainly abuse that flexibility like this.

      ;; data is a vector, specification is a map keyed with integers

      (validate-scalars [42 "abc" 22/7]   {0 int?, 1 string?, 2 ratio?}) ;; => [{:datum 42, ;; :path [0], ;; :predicate int?, ;; :valid? true} ;; {:datum "abc", ;; :path [1], ;; :predicate string?, ;; :valid? true} ;; {:datum 22/7, ;; :path [2], ;; :predicate ratio?, ;; :valid? true}]

      Speculoos merely knows that it could successfully locate 42 and int? at 0, etc. It 'worked' in this instance, but surprise lurks if we try to get to overly clever.

    3. Validation ignores un-paired predicates and un-paired datums.

      A decent number of surprising validations result from predicates pairing to unexpected datums or not being paired at all.

      ;; Oops! specification contains un-paired key :c; string "abc" isn't validated

      (valid-scalars? {:a 42, :b "abc"}   {:a int?, :c symbol?}) ;; => true


      ;; Oops! specification uses an extra level of nesting; [33] wasn't validated

      (validate-collections [11 [22 [33]]] [[[[list?]]]]) ;; => ()

      Corollary: valid? being true means there were zero non-true results. If the validation did not find any predicate+datum pairs, there would be zero invalid results, and thus return valid. Use the thorough-… function variants to require all datums to be validated.

      See below for strategies and tools for diagnosing mis-pairing.

  • Checking the presence or absence of an element is the job of a collection validation. Scalar validation is only concerned with testing the properties of a scalar, assuming that scalar exists.

    Testing whether an integer, located in the first slot of a vector, is greater than forty…

    (valid-scalars? [42]
                    [#(< 40 %)]) ;; => true

    …is a completely orthogonal concern from whether there is anything present in the first slot of a vector.

    (valid-collections? [42]
                        [#(get % 0)]) ;; => true

    Asking about an element's presence is, fundamentally, asking about whether a collection contains an item. If we want to test both a property of the scalar and its existence at a particular location in a collection, we could use the combo utility functions.

    (valid? [42]
            [#(< 40 %)]
            [#(get % 0)]) ;; => true

    This combo pattern validates the concept The first element must exist, and it must be larger than forty.

  • How would we validate the concept The third element of a sequential collection is a scalar or a nested collection? Both the following are valid.

    [42 "abc" 22/7]

    [42 "abc" ['foo]]

    The example in the upper row contains a scalar in the third position, while the example in the lower row contains a nested vector in the third position. According to our English language specification, both would be valid.

    Scalar validation discards all non-scalar elements (i.e., collections), so we must rely on the power and flexibility of collection validation. Collection validation passes the collection itself to the predicate, so the predicate has access to the collection's elements.

    We would write our predicate to pull out that third element and test whether it was a ratio or a vector.

    (defn third-element-ratio-or-vec?
      [c]
      (or (ratio? (get c 2)) (vector? (get c 2))))

    The validation passes the entire collection, c, to our predicate, and the predicate does the grunt work of pulling out the third element by using (get c 2).

    The validation would then look like this.

    (valid-collections? [42 "abc" 22/7]
                        [third-element-ratio-or-vec?])
    ;; => true


    (valid-collections? [42 "abc" ['foo]]   [third-element-ratio-or-vec?]) ;; => true

    The first validation returns true because 22/9 satisfies our third-element-ratio-or-vec? predicate. The second validation returns true because ['foo] also satisfies third-element-ratio-or-vec?.

    The principle holds for all collection types: Collection validation is required when either a scalar or a collection is a valid element.

  • Speculoos specifications are regular old data structures containing regular old functions. (I assume your data is, too.) If we're wrangling with something deep down in some nested mess, use our Clojure powers to dive in and pull out the relevant pieces.

    (let [data (get-in {:a {:b {:c [22/7]}}} [:a :b :c])
          spec (get-in {:a {:b {:c [int?]}}} [:a :b :c])]
      (validate-scalars data spec))
    ;; => [{:datum 22/7,
    ;;      :path [0],
    ;;      :predicate int?,
    ;;      :valid? false}]
  • Use the verbose functions. If we're using the high-level valid-…? function variants, we'll only see true/false, which isn't helpful when troubleshooting. The validate-… variants are chatty and will display everything it considered during validation.

  • The speculoos.utility namespace provides many functions for creating, viewing, analyzing, and modifying both scalar and collection specifications.

  • When the going really gets tough, break out speculoos.core/all-paths and apply it to our data, then to our specification, and then step through the validation with our eyes.

    (all-paths {:a [99]})
    ;; => [{:path [], :value {:a [99]}}
    ;;     {:path [:a], :value [99]}
    ;;     {:path [:a 0], :value 99}]


    (all-paths {:a 'int?}) ;; => [{:path [], :value {:a int?}} ;; {:path [:a], :value int?}]


    ;; Aha! The predicate `int?` at path [:a] and the integer 99 at path [:a 0] do not share a path!
  • When validating a function's arguments, remember that arguments are contained in an implicit sequence.

    (defn arg-passthrough [& args] args)


    (arg-passthrough [1 2 3]) ;; => ([1 2 3])


    (arg-passthrough [1 2 3] [4 5 6]) ;; => ([1 2 3] [4 5 6])

    If we're passing only a single value, it's easy to forget that the single value is contained in the argument sequence. Validating a function's arguments validates the argument sequence, not just the first lonely element that happens to also be a sequence.

    ;; seemingly single vector in, single integer out...

    (first [1 2 3]) ;; => 1


    ;; shouldn't integer `1` fail to satisfy predicate `string?`

    (validate-fn-with first {:speculoos/arg-scalar-spec [string?]} [1 2 3]) ;; => 1

    validate-fn-with passes through the value returned by first because validate-fn-with did not find any invalid results. Why not? In this example, 1 and string? do not share a path, and therefore validate-fn-with performed zero validations. Let's take a look.

    (all-paths [[1 2 3]])
    ;; => [{:path [], :value [[1 2 3]]}
    ;;     {:path [0], :value [1 2 3]}
    ;;     {:path [0 0], :value 1}
    ;;     {:path [0 1], :value 2}
    ;;     {:path [0 2], :value 3}]


    (all-paths [string?]) ;; => [{:path [], :value [string?]} ;; {:path [0], :value string?}]

    We find scalar 1 at path [0 0] in the argument sequence, while scalar predicate string? is located at path [0] in the scalar specification. The datum and predicate do not share paths, are therefore not paired, thus no validation (Motto #3). The fix is to make the specification mimic the shape of the data, the 'data' in this case being the argument sequence.

    (validate-fn-with first {:speculoos/arg-scalar-spec [[string?]]} [1 2 3])
    ;; => ({:datum 1,
    ;;      :fn-spec-type :speculoos/argument,
    ;;      :path [0 0],
    ;;      :predicate string?,
    ;;      :valid? false})

    Now that argument scalar specification properly mimics the shape of the argument sequence, scalar 1 and scalar predicate string? share a path [0 0], and validate-fn-with performs a scalar validation. 1 fails to satisfy string?.

    This also applies to validating arguments that are collections.

Finally, if you hit a wall, file a bug report or email me.

Alternatives

  • Staples SparX clj-schema

    Schemas for Clojure data structures and values. Delineates operations on maps, seqs, and sets. Contributors: Alex Baranosky, Laurent Petit, Punit Rathore


  • Steve Miner's Herbert

    A schema language for Clojure data for documenting and validating.


  • Metosin Malli

    Data-driven schemas incorporating the best parts of existing libs, mixed with their own tools.


  • Plumatic Schema

    A Clojure(Script) library for declarative data description and validation.


  • Christophe Grand's seqexp

    Regular expressions for sequences (and other sequables).


  • Jonathan Claggett's seqex

    Sequence Expressions, similar to regular expressions but able to describe arbitrary sequences of values (not just characters).


  • Clojure's spec.alpha

    [A] Clojure library to describe the structure of data and functions.


  • Clojure's spec-alpha2 or alpha.spec

    [A]n evolution from spec.alpha as well as work towards several new features. Note: Alex Miller considers it a work in progress as of 2020 June 20.


  • Jamie Brandon's Strucjure

    A library for describing stuff in an executable manner.


  • Brian Marick's structural-typing

    A library that provides good error messages when checking the correctness of structures, and a way to define structural types that are checked at runtime.


  • Peter Taoussanis' Truss

    A tiny library that provides fast and flexible runtime assertions with terrific error messages.


Glossary

element

A thing contained within a collection, either a scalar value or another nested collection.

heterogeneous, arbitrarily-nested data structure

Exactly one Clojure collection (vector, map, list, sequence, or set) with zero or more elements, nested to any depth.

non-terminating sequence

One of clojure.lang.{Cycle,Iterate,LazySeq,LongRange,Range,Repeat} that may or may not be realized, and possibly infinite. (I am not aware of any way to determine if such a sequence is infinite, so Speculoos treats them as if they are.)

path

A series of values that unambiguously navigates to a single element (scalar or sub-collection) in a heterogeneous, arbitrarily-nested data structure. In the context of the Speculoos library, the series of values comprising a path is generated by the all-paths function and consumed by the validate-… functions. Almost identical to the second argument of clojure.core/get-in, but with more generality.

Elements of vectors, lists, and sequences are addressed by zero-indexed integers. Map values are addressed by their keys, which are often keywords, but can be any data type, including integers, or composite types. Set members are addressed by their identities. Nested collections contained in a set can indeed be addressed: the path vector itself contains the collections. An empty vector [] addresses the outermost, containing collection.

predicate

A function, or something that implements IFn, like a set, that returns a truthy or falsey value. In most instances, a predicate is a function of one argument. Some Speculoos functions, such as validate-scalars and valid-scalars? also regard a regular expression as a competent predicate.

relationship

A human- and machine-readable declaration about the congruence between two elements. Specifically, Speculoos function validation may involve specifying a relationship between the function's argument and the function's return value.

scalar

A single, non-divisible datum, such as an integer, string, boolean, etc. Essentially, a shorter term for non-collection.

specification

A human- and machine-readable declaration about properties of data, composed of a heterogeneous, arbitrarily-nested data collection and predicates.

validate

An action that returns an exhaustive listing of all datum+predicate pairs, their paths, and whether the datum satisfies the predicate. Note: Validation requires two components, a datum and a predicate. Any unpaired datum or any unpaired predicate, will not participate in validation.

valid?

An action that returns true if all paired datums satisfy their predicates during a validation, false otherwise. Note: A validation operation's result is considered valid if there are zero datum+predicates.


License

This program and the accompanying materials are made available under the terms of the MIT License.

Can you improve this documentation?Edit on GitHub

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close