Diff and patch for Clojure sequences
This is a fork of a fork clj-diff as used by lambdaisland/deep-diff2. We
switched to the fork released by user rymndhng because it added ClojureScript
support (ported the code to cljc
).
So this repo and the lambdaisland/clj-diff artifact on Clojars basically contain the same code as org.clojars.rymndhng/clj-diff 1.1.1-SNAPSHOT.
However, that has been a SNAPSHOT release since 2015, we don't want to continue to depend on a SNAPSHOT release which may still change. It's also not pleasant for downstream users. Moreover, the commit that that jar references does not exist on Github, so this repo pulls out the code from the jar and commits it for reference.
Namespaces have been prefixed with lambdaisland.
so this artifact can co-exist
with the original clj-diff (or the earlier fork) on the same classpath. We've
also taken some liberties to modernize namespace forms, and to organize this
repo in a similar way to other lambdaisland projects (deps.edn, kaocha, common
release tooling, etc.).
We have also started applying bug fixes in this fork, see the CHANGELOG for details.
deps.edn
lambdaisland/clj-diff {:mvn/version "1.4.78"}
project.clj
[lambdaisland/clj-diff "1.4.78"]
Below is the original README
Provides diff
and patch
functions for Clojure
sequences where (diff a b) -> x
and (patch a x) ->
b
. Also provides edit-distance
and
levenshtein-distance
functions for calculating the
difference between two sequences.
user=> (use 'clj-diff.core)
user=> (diff "John went to the movies." "John was in the movies.")
{:+ [[5 \a \s \space \i]], :- [6 8 9 10 11]}
user=> (patch "John went to the movies." *1)
"John was in the movies."
user=> (edit-distance "John went to the movies." "John was in the movies."))
9
user=> (levenshtein-distance "John went to the movies." "John was in the movies.")
8
There is already a Java library which does this well. Why create a Clojure version? So that we can do this:
user=> (def a [{:a 1} {:a 2} {:a 3} {:a 4} {:a 5} {:a 6} {:a 7}])
user=> (def b [{:a 2} {:a 3} {:a 4} {:a 5} {:a 6} {:a 7} {:a 1}])
user=> (diff a b)
{:+ [[6 {:a 1}]], :- [0]}
user=> (patch a *1)
({:a 2} {:a 3} {:a 4} {:a 5} {:a 6} {:a 7} {:a 1})
user=> (edit-distance a b)
2
The current diff algorithm comes from the paper An O (NP) Sequence Comparison Algorithm by Sun Wu, Udi Manber, Gene Myers and Webb Miller. It is fast and memory efficient. It also makes use of the pre-diff optimizations mentioned in Neil Fraser's Diff Strategies. The worst-case running time of the algorithm is dependent only on the length of the longest sequence (N) and the number of deletions (P). This is much better than the Myers algorithm which is an O (ND) algorithm where N is the sum of the length of the two sequences and D is the edit distance.
I have created a separate project for comparing the performance of different diff algorithms. The main performance goal of this project is to create a Clojure diff that can outperform Fraser's Java implementation. One of the most interesting results is show below.
This chart shows the most interesting range of comparison between the three algorithms. The current algorithm being used by clj-diff is labeled "Miller". For larger sequences, the Miller algorithm outperforms Fraser. To summarize all of the performance test results; the Miller algorithm is never more than 50 milliseconds slower than the Fraser algorithm but the Fraser algorithm can be multiple seconds slower, and will run out of memory more quickly, for large sequences. For more information about performance see the clj-diff-performance project.
In order to understand the above two titles, you will need to know what N, D and P stand for. Let A and B be two sequences with lengths Q and M where Q >= M. Let D be the length of the minimum edit script for A and B. In the title of the first paper N = Q + M. In the second paper N = Q and P = (1/2)D - (1/2)(Q - M). Therefore, the O (NP) version is faster which is visualized in the charts below.
clj-diff is part of a growing collection of quality Clojure libraries created and maintained by the fine folks at Gaiwan.
Pay it forward by becoming a backer on our Open Collective, so that we may continue to enjoy a thriving Clojure ecosystem.
You can find an overview of our projects at lambdaisland/open-source.
Everyone has a right to submit patches to clj-diff, and thus become a contributor.
Contributors MUST
*
**
Contributors SHOULD
If you submit a pull request that adheres to these rules, then it will almost certainly be merged immediately. However some things may require more consideration. If you add new dependencies, or significantly increase the API surface, then we need to decide if these changes are in line with the project's goals. In this case you can start by writing a pitch, and collecting feedback on it.
*
This goes for features too, a feature needs to solve a problem. State the problem it solves, then supply a minimal solution.
**
As long as this project has not seen a public release (i.e. is not on Clojars)
we may still consider making breaking changes, if there is consensus that the
changes are justified.
Copyright © 2010-2021 Brenton Ashworth, Arne Brasseur, and contributors
Available under the terms of the Eclipse Public License 1.0, see LICENSE.txt
Can you improve this documentation?Edit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close