Many SQL functions and Column methods are overloaded to take either a keyword, a string or a Column instance as argument. For such cases, Geni implements Column coercion where
Because of this, basic arithmetic operations do not require lit
wrapping:
; The following two expressions are equivalent
(g/- (g// (g/sin Math/PI) (g/cos Math/PI)) (g/tan Math/PI))
(g/- (g// (g/sin (g/lit Math/PI)) (g/cos (g/lit Math/PI))) (g/tan (g/lit Math/PI)))
However, string literals do require lit
wrapping:
; The following fails, because "Nelson" is interpreted as a Column
(-> dataframe (g/filter (g/=== "SellerG" "Nelson")))
; The following works, as it checks the column "SellerG" against "Nelson" as a literal
(-> dataframe (g/filter (g/=== "SellerG" (g/lit "Nelson"))))
It may be useful to think of a Spark Dataset as a seq of maps, so that keywords can be idiomatically used to refer to columns (i.e. keys). For that reason, the predicate column above may be more idiomatically written as:
(g/=== :SellerG (g/lit "Nelson"))
Geni implements Column-array coercion to variadic SQL functions and Column methods, such as select
and group-by
. The coercion rules are as follos:
A function like select
can take all of these different types in a single invocation:
(-> dataframe
(g/select :SellerG
"Address"
(g/col "Postcode")
{:log-price (g/log :Price) :rooms :Rooms}
[:Date :Method]
#{:Lattitude :Longtitude})
g/columns)
=> (:SellerG :Address :Postcode :log-price :rooms :Date :Method :Lattitude :Longtitude)
All calls to filter
and remove
are implicitly casted to booleans. This means that the Columns can be left as, say, integers:
(-> dataframe
(g/remove (g/mod :Rooms 2))
(g/select :Rooms)
g/distinct
(g/collect-col :Rooms))
=> (4 6 2 10 8)
Can you improve this documentation?Edit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close