Neanderthal - notable changes between versions
- Updated to JavaCPP 1.5.11
- Updated to MKL 2025.0
- Updated uncomplicate commons and clojure-cpp
- Built with Java 8 bytecode compatibility.
- Completely moved CPU engine to JavaCPP (and removed Neanderthal Native)
- Support for sparse matrices
- Support for integers in Fluokitten functions
- More operations supported by integer engines
- Major internal engine re-coding
- Upgrade to CUDA 12.3
- Upgrade to MKL 2024.0
Skipped
- Upgrade CUDA to 11.8.
- Upgrade neanderthal-native MKL native dependency to oneAPI MKL 2022.2 on Linux and Windows, and 2021.1 on Mac (the latest that my mac 10.12 supports).
- Upgrade CUDA to 11.7. It should work with 11.6, too!
- Fixed a few issues with devices that miss some hardware support for various OpenCL features.
- Workaround the bug in Apple's OpenCL driver that doesn't support native_ functions.
- Viewable/view moved to commons.
- Does not require system-wide MKL. Uses binaries provided by the bytedeco jar if present.
- cublas engines support int, long, short, byte, and uint8
- New vect-math functions: exp2, exp10, log2, and log1p
- Fix short and byte engine crash.
- Enable submatrix of (tri)diagonal matrix when appropriate.
- Fix regression in core.rk! caused by 0.30.0 update.
- New function mmt! for symmetric matrix multiply with its transpose (faster than mm!).
- Support for symmetric rk! of a vector with its transpose.
- Support mapping a vector to a file channel.
- Support int, long, short, and byte vectors.
- Fix a subtle CUDA vect-math return object bug.
- Support CUDA 10.2.0
- A bunch of bugfixes provided by Kamil Toman (katox).
- view and view-* behavior made more consistent.
- Random number generation of vector entries with uniformly and normally distributed numbers.
- Renamed aux to auxil to work around a Windows bug of not allowing files named aux.
- Added copy-sign, ramp, step, and sigmoid functions to math and vect-math (MKL, CUDA, OpenCL)
- Introduced the Viewable/view protocol and view function (interop).
- CLBlast bumped to 1.5.0/
- Clojure upgraded to 1.10.
- Misc bugfixes.
- SDD available as a SVD implementation.
- Eigenvalues and eigenvectors computing available for symmetric matrices.
- Fluokitten performance regression (introduced in 0.18.0) fixed.
- Fluokitten support in non-double objects.
- Fluokitten accepts non-primitive function for Neanderthal objects.
- Custom non-blas sum function sped up on CPU.
- JCublas upgraded to 0.9.2.
- Support explicit stream in memcpy.
- CUDA engine uses explicit context.
- sum support in CUDA matrices.
- TRSV in OpenCL matrices.
- CLBlast dependency updated to 1.3.0. Context creation for OpenCL is much faster now.
- Vertigo dependency removed.
- view-ge supports arbitrary dimensions now.
- ge supports nested sequence as source for its rows.
- Added FlowProvider/flow to internal core.
- Updated to Java 9 modules. Requires add-open jvm argument if run with JDK 9+.
- Clojure dep updated to 1.9.0.
- Upgrades JCuda dependency to 0.9.0, supports CUDA 9.
- Core constructors accept any factory provider as factory.
- GPU objects are safe to print after the factory has been released.
- Fix the uplo_modf bug (#33).
- Upgraded JOCLBlast dependency to 1.2.0.
- Vectorized mathematical functions (cca 50 pieces in the vect-math namespace).
- New functions in the math namespace to support scalar equivalents of vect-math functions.
- Schur decomposition!
- JOCLBlast engine upgraded to 1.1.0.
- CUDA implementation of SY matrices.
- OpenCL implementation of SY matrices.
- Fixed call with wrong number of arguments for the transpose of OrthogonalFactorization.
- Diagonal matrices (GD)
- Tridiagonal matrices (GT)
- Diagonally dominant tridiagonal matrices (DT)
- Symmetric tridiagonal matrices (ST)
- Updated JOCLBlast dependency to 1.0.1.
- Orthogonal factorizations greatly simplified
- Symmetric and triangular mm support more layout and a/b position variations.
- Upgraded Intel MKL to 2018 (it should work with earlier versions, but YMMV).
- New simplified orthogonal factorization related functions replace the old api in linalg.
- Fixed TR and SY mm and mv when k=0.
- Fixed transpose implementations in various non-GE matrices.
- Symmetric matrices (SY)
- Banded matrices (GB, TB, SB)
- Packed matrices (TP, SP)
- Better printing
- Fluokitten protocols supported by all matrix types.
- Overhaul of internals that opens easier path for new matrix types and specialized engines.
- Automatized triangular factorizations and solvers.
- :order replaced by :layout in matrix options.
- Fix #29 - OpenCL engine does not try to load CUDA-related stuff any more.
- tr* work with LUFactorization instead of GEMatrix.
- Matrix inverse.
- Condition number.
- Pure tr* methods.
- Improved TRMatrix printing.
- Support for chained matrix multiplication in mm.
- Support for inverting matrices through trf/tri.
- CUDA/cuBLAS based engine (requires CUDA toolkit).
- Additional methods from Blas supported by matrices.
- Added aux namespace for auxiliary functions.
- Sorting of vectors, GE, and TR host matrices.
- Bulk alter! method added.
- view-vctr and view-ge support stride multiplier.
- set-all accepts NaN
- CL factories implement FactoryProvider.
- Linear algebra functions (LAPACK).
- Support for TR matrices.
- Pretty-printing
- GE and TR support some more BLAS-1 functions.
- Cheat Sheet in the docs.
- Updated JOCLBlast dependency to 0.10.0.
- Updated Fluokitten dependency to 0.6.0.
- Internal api and implementations made more straightforward.
- Naming scheme changed from single to float for single-precision structures.
- sv and sge from the native namespace renamed to fv and fge.
- core constructor functions changed from create to vctr, ge, tr, etc.
- Core constructors no longer accept raw buffers.
- Removed the amd-gcn engine. Use clblast engine instead.
- The native part is now compiled for Linux, MacOX AND Windows
- one-argument pow function added.
- native-factory method added to FactoryProvider protocol.
- factories and data accessors implement the compatible method.
- Updated JOCLBlast dependency to 0.9.0.
- Attempting (create -m -n) now throws IllegalArgumentException
- Fixed a sum bug in native implementation when stride is not 1.
- scal implementation for matrices.
- Fixed a Buffer.limit bug in subvector and submatrix.
- Fixes #15
- native function in core
- one-argument fold now use sum instead of looping.
- Updated JOCLBlast dependency to 0.8.0 (also fixes #15)
- Updated ClojureCL dependency to 0.6.4
- Updated ClojureCL dependency to 0.6.3
- Completely new OpenCL engine for GPU matrix computing - **supports AMD, Nvidia, and Intel, on Linux, Windows, and OSX
- Support Fluokitten's Monoid and Magma in vectors and matrices
- transfer method in core that always transfers data to host memory
- opencl methods renamed
- default OpenCL engine changed to clblast
- old amd-gcn engine deprecated
- Streamlined dependencies: no longer need 2 dependencies in project files. The dependency on uncomplicate/neanderthal is enough
- Comes with Mac OS X build out of the box. No need even for external ATLAS.
- release and with-release moved from ClojureCL to uncomplicate/commons
- Support for Fluokitten's fmap!, fmap, fold, foldmap, op...
- Streamlined factory-based constructors in core.
- OpenCL vectors and matrices now support equality comparisons, offsets, strides,
subvectors, and submatrices. Matrices now can be swapped and copied.
- OpenCL read! and write! replaced with generic transfer! multimethod that supports
a much wider area of memory types to move data to and from.
- A large number of internal implementation changes that should not affect end users
(other than as removing bugs).
- Several important bugfixes (see git commit history).
- Support for pluggable BLAS engines
- GPU computing engine based on OpenCL (kernels optimized for AMD for now)
- Reorganized namespaces - now almost complete public API is in the core namespace
- Changed the order of parameters in axpy!, mv! and mm! (and their variants)
- implemented BLAS support for floats
- implemented fmap!, freduce, and fold functions for all existing types of matrices and vectors
No API changes were required for these features.