Liking cljdoc? Tell your friends :D

Statistical Machine Intelligence & Learning Engine SMILE

Maven Central CI

SMILE (Statistical Machine Intelligence & Learning Engine) is a comprehensive, high-performance machine learning framework for the JVM. SMILE v5+ requires Java 25; v4.x requires Java 21; all previous versions require Java 8. SMILE also provides idiomatic APIs for Scala and Kotlin. With advanced data structures and algorithms, SMILE delivers state-of-the-art performance across every aspect of machine learning.


Table of Contents

  1. Features
  2. Module Map
  3. Installation
  4. Quick Start
  5. SMILE Studio & Shell
  6. Model Serialization
  7. Visualization
  8. License
  9. Issues & Discussions
  10. Contributing
  11. Maintainers
  12. Gallery

Features

AreaHighlights
LLMLLaMA-3 inference, tiktoken BPE tokenizer, OpenAI-compatible REST server, SSE chat streaming
Deep LearningLibTorch/GPU backend, EfficientNet-V2 image classification, custom layer API
ClassificationSVM, Decision Trees, Random Forest, AdaBoost, Gradient Boosting, Logistic Regression, Neural Networks, RBF Networks, MaxEnt, KNN, Naïve Bayes, LDA/QDA/RDA
RegressionSVR, Gaussian Process, Regression Trees, GBDT, Random Forest, RBF, OLS, LASSO, ElasticNet, Ridge
ClusteringBIRCH, CLARANS, DBSCAN, DENCLUE, Deterministic Annealing, K-Means, X-Means, G-Means, Neural Gas, Growing Neural Gas, Hierarchical, SIB, SOM, Spectral, Min-Entropy
Manifold LearningIsoMap, LLE, Laplacian Eigenmap, t-SNE, UMAP, PCA, Kernel PCA, Probabilistic PCA, GHA, Random Projection, ICA
Feature EngineeringGenetic Algorithm selection, Ensemble selection, TreeSHAP, SNR, Sum-Squares ratio, data transformations, formula API
NLPSentence / word tokenization, Bigram test, Phrase & Keyword extraction, Stemmer, POS tagging, Relevance ranking
Association RulesFP-growth frequent itemset mining
Sequence LearningHidden Markov Model, Conditional Random Field
Nearest NeighborBK-Tree, Cover Tree, KD-Tree, SimHash, LSH
Numerical MethodsLinear algebra, numerical optimization (BFGS, L-BFGS), interpolation, wavelets, RBF, distributions, hypothesis tests
VisualizationSwing plots (scatter, line, bar, box, histogram, surface, heatmap, contour, …) and declarative Vega-Lite charts

Module Map

Each module has its own detailed user guide. Click the README link for the module overview, or drill into individual topic guides.

base/ — Foundation

Data structures, math, linear algebra, statistical utilities, I/O

DocumentTopics
READMEModule overview and dependency setup
DATA_FRAME.mdDataFrame API — creation, selection, transformation
DATA_IO.mdCSV, JSON, Parquet, Arrow, JDBC, Avro readers/writers
DATA_TRANSFORMATION.mdScalers, encoders, imputers, feature transforms
DATASET.mdBuilt-in benchmark and real-world datasets
FORMULA.mdR-style formula language for model matrices
DISTRIBUTIONS.mdProbability distributions (Normal, Poisson, Beta, …)
HYPOTHESIS_TESTING.mdt-test, chi-squared, ANOVA, KS-test, …
DISTANCES.mdEuclidean, Mahalanobis, Hamming, edit distance, …
NEAREST_NEIGHBOR.mdKD-Tree, Cover Tree, BK-Tree, LSH
KERNELS.mdGaussian, polynomial, Laplacian, and other kernel functions
RBF.mdRadial basis function networks
INTERPOLATION.mdLinear, cubic spline, bilinear, bicubic
GRAPH.mdAdjacency list/matrix graph, BFS/DFS, spanning trees
SORT.mdQuick sort, heap sort, counting sort, index sort
HASH.mdLocality-sensitive hashing, SimHash
RNG.mdRandom number generators, sampling, permutations
BFGS.mdL-BFGS and BFGS numerical optimizers
ICA.mdIndependent Component Analysis
TENSOR.mdN-dimensional array (CPU tensor without LibTorch)
WAVELET.mdDWT, CWT, and wavelet families
GAP.mdGAP statistic for optimal cluster count estimation
COMPRESSED_SENSING.mdCompressed sensing and basis pursuit

core/ — Machine Learning Algorithms

Classification, regression, clustering, manifold learning, and more

DocumentTopics
READMEModule overview
CLASSIFICATION.mdSVM, Random Forest, AdaBoost, GBDT, KNN, Naïve Bayes, LDA, …
REGRESSION.mdSVR, Gaussian Process, LASSO, Ridge, ElasticNet, GBDT, …
CLUSTERING.mdK-Means, DBSCAN, BIRCH, SOM, Spectral Clustering, …
FEATURE_ENGINEERING.mdFeature selection, PCA, ICA, projection, encoding
MANIFOLD.mdt-SNE, UMAP, IsoMap, LLE, Laplacian Eigenmap
ANOMALY_DETECTION.mdIsolationForest, one-class SVM, local outlier factor
ASSOCIATION_RULE_MINING.mdFP-growth, association rules, frequent itemsets
SEQUENCE.mdHMM (Baum-Welch, Viterbi), CRF
TIME_SERIES.mdARIMA, box-plots, autocorrelation
REGRESSION.mdFull regression API reference
TRAINING.mdCross-validation, bootstrap, hyper-parameter search
VALIDATION.mdHold-out, k-fold, leave-one-out evaluation
VALIDATION_METRICS.mdAccuracy, AUC, F1, RMSE, MAE, confusion matrix
HYPER_PARAMETER_OPTIMIZATION.mdGrid search, random search, Bayesian optimization
VECTOR_QUANTIZATION.mdLVQ, Neural Gas, SOM as vector quantizers
ONNX.mdExporting and importing models via ONNX

deep/ — Deep Learning & LLMs

LibTorch-backed GPU/CPU tensor operations, neural network layers, LLaMA-3 inference, EfficientNet

DocumentTopics
READMEFull deep-learning & LLM user guide (tensors, layers, loss, optimizer, EfficientNet, LLaMA)

The deep/README.md covers:

  • smile.deep.tensor — Tensor factory, indexing, arithmetic, AutoScope memory management, dtype/device
  • smile.deep.layer — Linear, Conv2d, pooling, normalization (BN/GN/RMS), dropout, embedding, sequential blocks
  • smile.deep.activation — ReLU, GELU, SiLU, Tanh, Sigmoid, Softmax, GLU, HardShrink, …
  • smile.deep.Loss — MSE, cross-entropy, BCE, Huber, KL, hinge, and more
  • smile.deep.Optimizer — SGD, Adam, AdamW, RMSprop
  • smile.deep.Model — Abstract base class + training loop
  • smile.deep.metric — Accuracy, Precision, Recall, F1Score with macro/micro/weighted averaging
  • smile.llmMessage, Role, FinishReason, ChatCompletion records; sinusoidal & RoPE positional encodings
  • smile.llm.tokenizerTokenizer interface, Tiktoken BPE implementation (LLaMA-3 compatible)
  • smile.llm.llama — Full LLaMA-3 stack: Llama.build(), generate(), chat(), streaming via SubmissionPublisher
  • smile.visionVisionModel, ImageDataset, EfficientNet.V2S/M/L() pretrained models, ImageNet labels
  • smile.vision.transformTransform interface, ImageClassification pipeline, resize/crop/toTensor helpers

nlp/ — Natural Language Processing

Text normalization, tokenization, POS tagging, stemming, relevance ranking

DocumentTopics
READMEModule overview
TOKENIZER.mdSentence splitter, word tokenizer, regex tokenizer
POS.mdPart-of-speech tagging (Brill tagger, HMM tagger)
STEM.mdPorter, Lancaster, Lovins stemmers; lemmatization
COLLOCATION.mdBigram/trigram statistical tests, phrase extraction
RELEVANCE.mdTF-IDF, BM25, keyword extraction
TAXONOMY.mdWordNet integration, synsets, hypernyms

plot/ — Data Visualization

Swing-based interactive plots and declarative Vega-Lite charts

DocumentTopics
READMESwing plotting API — scatter, line, bar, box, histogram, heatmap, surface, contour, wireframe
VEGA.mdDeclarative smile.plot.vega (Vega-Lite) — JSON spec generation, web/Jupyter rendering

serve/ — Inference Server

Quarkus-based REST inference service with OpenAI-compatible API and SSE streaming

DocumentTopics
READMEBuilding and running the server, /chat/completions endpoint, SSE streaming, configuration

studio/ — Interactive Shell & Desktop IDE

REPL / notebook environment for Java, Scala, and Kotlin

DocumentTopics
README.mdDesktop Studio notebook UI, cell types, output rendering
CLICLI entry points (smile, smile shell, smile scala, smile kotlin, smile server)

scala/ — Scala API

Idiomatic Scala shim — concise wrappers, symbolic operators, Scala collections integration

DocumentTopics
READMEAPI overview, smile.classification, smile.regression, smile.clustering, smile.plot in Scala

kotlin/ — Kotlin API

Idiomatic Kotlin shim — extension functions, named parameters, builder DSLs

DocumentTopics
READMEAPI overview, extension functions, Kotlin-style builders
packages.mdFull package-by-package listing of all Kotlin extension functions

json/ — JSON Library (Scala)

Lightweight zero-dependency JSON library for Scala with a clean DSL

DocumentTopics
READMEParsing, building, pattern matching, path navigation, serialization

spark/ — Apache Spark Integration

Use SMILE models inside Spark ML pipelines

DocumentTopics
READMESmileTransformer, SmileClassifier, SmileRegressor; training and scoring in Spark DataFrames

Installation

Maven

<!-- Core ML algorithms -->
<dependency>
  <groupId>com.github.haifengl</groupId>
  <artifactId>smile-core</artifactId>
  <version>6.2.1</version>
</dependency>

<!-- Deep learning + LLMs (requires LibTorch) -->
<dependency>
  <groupId>com.github.haifengl</groupId>
  <artifactId>smile-deep</artifactId>
  <version>6.2.1</version>
</dependency>

<!-- Natural language processing -->
<dependency>
  <groupId>com.github.haifengl</groupId>
  <artifactId>smile-nlp</artifactId>
  <version>6.2.1</version>
</dependency>

<!-- Data visualization -->
<dependency>
  <groupId>com.github.haifengl</groupId>
  <artifactId>smile-plot</artifactId>
  <version>6.2.1</version>
</dependency>

SBT (Scala)

libraryDependencies += "com.github.haifengl" %% "smile-scala" % "6.2.1"

Gradle (Kotlin)

dependencies {
    implementation("com.github.haifengl:smile-kotlin:6.2.1")
}

Native Libraries (BLAS / LAPACK)

Several algorithms (manifold learning, Gaussian Process, MLP, some clustering) require BLAS and LAPACK.

Linux (Ubuntu / Debian)

sudo apt update
sudo apt install libopenblas-dev libarpack2-dev

macOS (Homebrew)

brew install arpack
# If macOS SIP strips DYLD_LIBRARY_PATH, create a symlink to the dylib in your working dir:
ln -s /opt/homebrew/lib/libarpack.dylib .

Windows — pre-built DLLs are included in the bin/ directory of the release package. Add that directory to PATH.

GPU (CUDA) — make sure the LibTorch CUDA native libraries are on PATH (Windows) or LD_LIBRARY_PATH (Linux).


Quick Start

import smile.classification.RandomForest;
import smile.data.formula.Formula;
import smile.io.Read;

// Load data
var data = Read.csv("src/test/resources/iris.csv");

// Train a random forest
var forest = RandomForest.fit(Formula.lhs("species"), data);

// Predict
int label = forest.predict(data.get(0));
System.out.println("Predicted class: " + label);

For deep learning and LLM examples, see deep/README.md. For visualization examples, see plot/README.md.


SMILE Studio & Shell

SMILE ships with an interactive desktop Studio (notebook-style) and a set of CLI shells. See studio/README.md for full documentation.

Download a pre-packaged release from the releases page, then:

path/to/smile/bin/setup      # install required native dependencies
path/to/smile/bin/smile      # launch SMILE Studio from your project directory

Other entry points:

CommandDescription
smileDesktop notebook IDE
smile shellJava REPL with all SMILE packages pre-imported
smile scalaScala REPL
smile trainTrain a supervised learning model
smile predictPredict on a file using a saved model
smile serveStart the LLM inference server

To increase the JVM heap:

path/to/smile/bin/smile -J-Xmx30G

Model Serialization

Most SMILE models implement java.io.Serializable. You can serialize a trained model to disk and load it in a production environment or inside a Spark job:

// Save
try (var out = new ObjectOutputStream(new FileOutputStream("model.ser"))) {
    out.writeObject(forest);
}

// Load
try (var in = new ObjectInputStream(new FileInputStream("model.ser"))) {
    var loaded = (RandomForest) in.readObject();
}

Visualization

SMILE provides two visualization layers:

  • smile.plot.swing — Swing-based interactive 2D/3D plots. See plot/README.md.
  • smile.plot.vega — Declarative Vega-Lite charts for browsers and Jupyter. See plot/VEGA.md.
<dependency>
  <groupId>com.github.haifengl</groupId>
  <artifactId>smile-plot</artifactId>
  <version>6.2.1</version>
</dependency>

License

SMILE employs a dual license model designed to meet the development and distribution needs of both commercial distributors (OEMs, ISVs, VARs) and open source projects. For details, see LICENSE. To acquire a commercial license, contact smile.sales@outlook.com.


Issues & Discussions

ChannelPurpose
GitHub DiscussionsQuestions, ideas, show-and-tell
Stack Overflow [smile]Technical Q&A
Issue TrackerBug reports and feature requests
Online DocsTutorials and programming guides
Java API · Scala API · Kotlin API · Clojure APIAPI Javadoc

Contributing

Please read CONTRIBUTING.md for build and test instructions.


Maintainers


Gallery

SPLOM

Scatterplot Matrix

Scatter

Scatter Plot

Heart

Line Plot

Surface

Surface Plot

Scatter

Bar Plot

Box Plot

Box Plot

Histogram

Histogram Heatmap

Rolling

Rolling Average

Map

Geo Map

UMAP

UMAP

Text

Text Plot

Contour

Heatmap with Contour

Hexmap

Hexmap

IsoMap

IsoMap

LLE

LLE

Kernel PCA

Kernel PCA

Neural Network

Neural Network

SVM

SVM

Hierarchical Clustering

Hierarchical Clustering

SOM

SOM

DBSCAN

DBSCAN

Neural Gas

Neural Gas

Wavelet

Wavelet

Mixture

Exponential Family Mixture

Teapot

Teapot Wireframe

Interpolation

Grid Interpolation

Can you improve this documentation? These fine people already did:
Haifeng Li, Karl Li, j, Marios Zindilis, Ian McIntosh, Anuj Saxena, Bruno P. Kinoshita, Gaillard Théo, The Gitter Badger & Erich Schubert
Edit on GitHub

cljdoc builds & hosts documentation for Clojure/Script libraries

Keyboard shortcuts
Ctrl+kJump to recent docs
Move to previous article
Move to next article
Ctrl+/Jump to the search field
× close