A simple implementation of Random Forests for classification and regression in Clojure.
Features:
Limitations:
A description of random forests can be found at: http://www.stat.berkeley.edu/~breiman/RandomForests/.
Decision trees are constructed recursively as anonymous functions choosing splitting nodes that minimize the Gini impurity. A textual representation of the generated tree is generated and stored as meta data.
To use add to your project.clj
:
[random-forests-clj "0.2.0"]
Feaures are represented by the index in the training example. A tree can be built using the build-tree
method providing the training examples and the indices of the features to use.
(use 'random-forests.core)
;; target is in the last position
(def examples (list ["M" "<25" 1] ["M" "<40" 0] ["F" "<35" 1] ["F" "<30" 1]))
;; features can be continuous, categorical or text
(def features (set (list (feature 0 :categorical) (feature 1 :categorical))))
;; return a lazy sequence of decision trees with:
;; - 1 random feature per splitting node
;; - a bootstrap resample of 2 examples per tree
(def t (first (build-random-forest examples features 1 2))
(meta t) ;; => {:tree "if(1=<40){0}else{1}"}
Each tree is a function, and new examples can classified by calling the function:
(t ["M" "<20"]) ;; => 1
Models can built from the command line using lein run
:
Usage:
Switches Default Desc
-------- ------- ----
-h, --no-help, --help false Show help
-f, --features [] Features specification (matching CSV header): name=continuous,foo=text
-s, --size 1000 Size of bootstrap sample per tree
-m, --split 100 Number of features to sample for each split
-o, --output Write detailed training error output in CSV format to output file
-t, --target Prediction target name
-b, --no-binary, --binary false Perform binary classification of target (measures AUC loss)
-l, --limit 100 Number of trees to build
To build a binary classifier on the provided test data set using a forest of 500 trees:
lein run -f V1=categorical,V2=categorical,V3=categorical,V4=categorical,V5=categorical,V6=categorical,V7=categorical,V8=categorical,V9=categorical \
-l 500 \
-t target=continuous \
-b \
test/data/cancer.csv
which will output out of sample AUC loss for the entire forest as each tree is added to the forest:
1: 0.875000
2: 0.843000
3: 0.824000
4: 0.798000
5: 0.843000
6: 0.855000
7: 0.855000
8: 0.878000
9: 0.864000
10: 0.883000
11: 0.879000
12: 0.892000
13: 0.906000
14: 0.906000
15: 0.935000
...
Copyright (C) 2010-2012 Erik Andrejko
Distributed under the Eclipse Public License, the same as Clojure.
Can you improve this documentation?Edit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close