A library for reading XML documents into nice Clojure data structures with the help of XSD.
This library is not used anywhere and definitely not in anything resembling production, so I recommend you don't do that either.
XML is a somewhat human readable serialization format. Clojure has libraries like clojure.data.xml (used here too) that can read and write XML.
However, an XML file alone is not enough to describe the full meaning of the file. Say we have an element like
<answer>42</answer>
Is that 42 a number, or maybe a string? Is 'answer' a single value, or a list of one?
This information can be guessed, hardcoded or read from an out-of-band (meaning separate) schema. One standard for representing the schemas is XML Schema.
An XML schema might have a line such as
<element name="answer" type="xs:int" minOccurs="0" maxOccurs="1" />
... which tells that the previous snippet was a single value with the type integer. And that you might have to get along without having an answer.
The aim of this library is to combine the schema and the data files to produce Clojure data structures that are as simple as possible and require as little additional processing as possible to extract whatever information needs to be extracted.
The key things are
Obvious prior art can be found in the XML serialization libraries of your favourite object oriented enterprise languages.
Let's say you have a schema like this:
<?xml version="1.0" encoding="utf-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:ex="http://example.org/test-schema-1"
xmlns="http://example.org/test-schema-1"
targetNamespace="http://example.org/test-schema-1"
elementFormDefault="qualified">
<xs:element name="top" type="ex:topType" />
<xs:complexType name="topType">
<xs:sequence>
<xs:element name="optional-element" type="subType" minOccurs="0" maxOccurs="1" />
<xs:element name="mandatory-element" type="xs:string" minOccurs="1" maxOccurs="1" />
<xs:element name="repeating-element" type="xs:string" minOccurs="0" maxOccurs="unbounded" />
</xs:sequence>
<xs:attribute name="some-attribute" type="xs:string" form="qualified" />
<xs:attribute name="numeric-attribute" type="xs:integer" />
</xs:complexType>
<xs:complexType name="subType">
<xs:sequence>
<xs:element name="some-string" minOccurs="1" maxOccurs="1" type="xs:string" />
<xs:element name="some-number" minOccurs="1" maxOccurs="1" type="xs:integer" />
</xs:sequence>
</xs:complexType>
</xs:schema>
And a document like this:
<?xml version="1.0" encoding="utf-8"?>
<ex:top xmlns:ex="http://example.org/test-schema-1"
ex:some-attribute="jau" numeric-attribute="987">
<ex:optional-element>
<ex:some-string>jabada</ex:some-string>
<ex:some-number>1</ex:some-number>
</ex:optional-element>
<ex:mandatory-element>
asdf
</ex:mandatory-element>
<ex:repeating-element>yippee</ex:repeating-element>
<ex:repeating-element>!!!</ex:repeating-element>
</ex:top>
You can get a nice clojure representation of the document like this:
(require '[com.vincit.clj-xsd.core :as cxs])
(require '[clojure.java.io :as io])
; read schema out-of-band
(def schema
(with-open [schema (io/input-stream "test_resources/schema1.xsd")]
(cxs/read-schema schema)))
; read a single data file
(with-open [file (io/input-stream "test_resources/doc1.xml")]
(cxs/parse schema file))
=> {:top {:some-attribute "jau"
:numeric-attribute 987 ; it's a number!
:optional-element {:some-string "jabada"
:some-number 1}
:mandatory-element "asdf"
:repeating-element ("yippee" "!!!")}} ; it's a list!
; ... and the keys would have been namespaced if we provided a
; namespace mapping from XML namespaces (strings) to clojure
; namespace symbols on our call to cxs/parse
For parsing the XML files (schemas and data files alike), clojure.data.xml is used.
The schema is read into an internal representation that is somewhat simplified from the original. In theory you may write that yourself if you don't have an XML Schema available but would like to guess the structure of your document. Or if the schema is too hard for this little library to understand. The internal format is not stable.
You can attach custom parsers for simple- and complex types. See com.vincit.clj-xsd.parser.custom.* for examples. There is also a generic order-preserving parser which can read sequences with nested groups without losing the element order.
You can also attach post-processing functions which are invoked every time an element of a specific type is processed.
Copyright © 2018 Vincit
Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.
Can you improve this documentation?Edit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close