Current version: v1.1.0
Sometimes in protocols and formats such as IRC and bad HTML, pieces of text with different encoding are mixed in the same byte stream. This library provides charsets to the JVM platform that can understand both the UTF-8 and the ISO 8859-1 (or windows-1252) encodings and choose the one that makes most sense.
This library integrates with the Java charset system. Wherever you would specify a charset, shuch as "UTF-8", you can also choose among one of the following charsets that this library provides:
Note that these charsets can only be used for decoding. An example, using the first one to decode data from an InputStream:
new InputStreamReader(in, "X-UTF-8_with_ISO-8859-1_fallback")
For an example program, see examples/Mixed2Utf8.java
.
You can either download it manually or let Maven (or a Maven based build tool) fetch it for you.
https://github.com/raek/utf8-with-fallback/archives/master
[se.raek/utf8-with-fallback "1.1.0"]
<dependency>
<groupId>se.raek</groupId>
<artifactId>utf8-with-fallback</artifactId>
<version>1.1.0</version>
</dependency>
You might also need to add the Clojars repo:
<repository>
<id>clojars.org</id>
<url>http://clojars.org/repo</url>
</repository>
Copyright (C) 2010 Rasmus Svensson
This library is distributed under the GNU Lesser General Public License v3.0 (http://www.gnu.org/licenses/lgpl-3.0-standalone.html) and the Eclipse Public License v1.0 (http://www.eclipse.org/org/documents/epl-v10.html).
Can you improve this documentation?Edit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close