Liking cljdoc? Tell your friends :D

Directory: clojask.extensions

Like many popular Python libraries, such as numpy and pandas, third-party users can extend the function of Clojask by introducing more codes above the basic source code. This attempt is interesting and encouraged. Here is an example of creating such extension functions.

ns: clojask.extensions.bind

Contains functions that can help to bind several dataset files together on both directions.

API Foundation

When defining a clojask.classes.DataFrame.DataFrame using dataframe function, one can input a function instead of the path of the source file. This function should produce a sequence. If this sequence is lazy, the theoretical length of the sequence can be infinite. Otherwise, it must have a finite length that is smaller than the memory size.

(def x (dataframe #(["col1,col2" "1,2" "3,4"])))

Based on this API, we can define the cbind and rbind function for two csv files.

cbind-csv

Joins some csv files into a new dataframe by columns.

ArgumentTypeFunctionRemarks
path-aStringThe path of the first csv fileCan be absolute or relative path
path-bStringThe path of the second csv fileCan be absolute or relative path
[path-c's]StringTarget columnsCan be absolute or relative path; the number is not limited

Example

;; file a
;; date,item,price
;; 2010-01-20,1,18.3
;; 2010-01-20,2,38.3
;; 2010-01-23,1,18.9
;; 2010-01-23,2,48.9
;; 2010-01-26,1,19.1
;; 2010-01-26,2,59.1
;; file b
;; date,cust,Item,sold
;; 2010-01-19,101,2,11
;; 2010-01-22,102,1,7
;; 2010-01-24,102,2,9
;; 2010-01-25,101,2,9
;; 2010-01-26,101,1,10
(def x (cbind "path/to/a" "path/to/b"))
;; x
;; date1,item,price,date2,cust,Item,sold
;; 2010-01-20,1,18.3,2010-01-19,101,2,11
;; 2010-01-20,2,38.3,2010-01-22,102,1,7
;; 2010-01-23,1,18.9,2010-01-24,102,2,9
;; 2010-01-23,2,48.9,2010-01-25,101,2,9
;; 2010-01-26,1,19.1,2010-01-26,101,1,10

rbind-csv

Joins some csv files into a new dataframe by rows.

ArgumentTypeFunctionRemarks
path-aStringThe path of the first csv fileCan be absolute or relative path
path-bStringThe path of the second csv fileCan be absolute or relative path
[path-c's]StringTarget columnsCan be absolute or relative path; the number is not fixed

Example

;; file a
;; date,item,price
;; 2010-01-20,1,18.3
;; 2010-01-20,2,38.3
;; 2010-01-23,1,18.9
;; 2010-01-23,2,48.9
;; 2010-01-26,1,19.1
;; 2010-01-26,2,59.1
;; file b
;; date,cust,Item,sold
;; 2010-01-19,101,2,11
;; 2010-01-22,102,1,7
;; 2010-01-24,102,2,9
;; 2010-01-25,101,2,9
;; 2010-01-26,101,1,10
(def x (rbind "path/to/a" "path/to/b"))
(print-df x)
|             date |             item |            price |
|------------------+------------------+------------------|
| java.lang.String | java.lang.String | java.lang.String |
|       2010-01-20 |                1 |             18.3 |
|       2010-01-20 |                2 |             38.3 |
|       2010-01-23 |                1 |             18.9 |
|       2010-01-23 |                2 |             48.9 |
|       2010-01-26 |                1 |             19.1 |
|       2010-01-26 |                2 |             59.1 |
|       2010-01-19 |              101 |                2 |
|       2010-01-22 |              102 |                1 |
|       2010-01-24 |              102 |                2 |
|       2010-01-25 |              101 |                2 |

It is also possible and encouraged to create more binding functions for other file types.

ns: clojask.extensions.reshape

Contains functions that can reshape a clojask dataframe from wide to long or from long to wide.

API Foundation

When defining a clojask.classes.DataFrame.DataFrame using dataframe function, you can specify the option :melt, which should be a function that will be applied to each resultant row vector in the end. The default is vector, which will not affect the results. However, if :melt is set to

(fn [x]
  (repeat 2 x))

, then each row will be output twice.

melt

Reshape the dataframe from wide to long.

ArgumentTypeFunctionRemarks
dataframeclojask.classes.DataFrame.DataFrameSpecify the dataframe
output-pathStringThe path of the outputCan be absolute or relative path with respect to the project.clj file.
idvector of stringsThe fixed portion of the columnsThese columns must have a perfect correlation.
measurementvector of stringsThe measurement columnsIn the result, the measurement names will become one column and the values will become another.
[measure_name]StringThe name of the measurement in the resultBy default "measure"
[value_name]StringThe name of the value in the resultBy default "value"

Example

;; x
;; family_id,age_mother,dob_child1,dob_child2,dob_child3
;; 1,30,1998-11-26,2000-01-29,
;; 2,27,1996-06-22,,
;; 3,26,2002-07-11,2004-04-05,2007-09-02
;; 4,32,2004-10-10,2009-08-27,2012-07-21
;; 5,29,2000-12-05,2005-02-28,
(melt x "path/to/output" ["family_id" "age_mother"] ["dob_child1" "dob_child2" "dob_child3"])

dcast

Reshape the dataframe from long to wide. Reversible to melt.

ArgumentTypeFunctionRemarks
dataframeclojask.classes.DataFrame.DataFrameSpecify the dataframe
output-pathStringThe path of the outputCan be absolute or relative path with respect to the project.clj file.
idvector of stringsThe fixed portion of the columnsThese columns must have a perfect correlation.
measure-nameStringThe name of the measurementBy default "measure"
value-nameStringThe name of the valueBy default "value"
valuesvector of string/int/double/datetimeThe value choices of the measurement columnThe order matters as in the result file.
[vals-name]vector of stringThe name of the value columnsBy default, same as values

Example

;; x
;; family_id,age_mother,measure,value
;; 1,30,dob_child1,1998-11-26
;; 1,30,dob_child2,2000-01-29
;; 1,30,dob_child3,
;; 2,27,dob_child1,1996-06-22
;; 2,27,dob_child2,
;; 2,27,dob_child3,
;; 3,26,dob_child1,2002-07-11
;; 3,26,dob_child2,2004-04-05
;; 3,26,dob_child3,2007-09-02
;; 4,32,dob_child1,2004-10-10
;; 4,32,dob_child2,2009-08-27
;; 4,32,dob_child3,2012-07-21
;; 5,29,dob_child1,2000-12-05
;; 5,29,dob_child2,2005-02-28
;; 5,29,dob_child3,
(dcast x "resources/test.csv" ["family_id" "age_mother"] "measure" "value" ["dob_child1" "dob_child2" "dob_child3"])

Can you improve this documentation?Edit on GitHub

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close