Important note: you don't need SQL data-importer if you use corona.index
+ your favorite clojure sql library
.
mkdir $SOLR_HOME/contrib/dataimporthandler/lib
wget -P $SOLR_HOME/contrib/dataimporthandler/lib https://jdbc.postgresql.org/download/postgresql-42.2.5.jar
See https://jdbc.postgresql.org/download.html for more information.
(cmd/exec! :start)
(cmd/copy-core-template!
:dih/db
"$YOUR_CLOJURE_PROJECT/resources/solr/<my-core-name>")
(cmd/exec! :stop)
In resources/solr//conf/solr-config.xml change requestHandler named dataimport to:
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">resources/solr//db-data-config.xml</str>
</lst>
</requestHandler>
Add following tags under the <config>
tag so that Solr picks up the Custom JARs
<lib dir="${solr.install.dir:../../../..}/contrib/dataimporthandler/lib" regex=".*\.jar" />
<lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-dataimporthandler-.*\.jar" />
The file db-data-config.xml
is located in same folder as solrconfig.xml
and defines data we want to import/index from our datasource.
Change datasource tag
<dataSource
driver="org.postgresql.Driver"
url="jdbc:postgresql://localhost:5432/my-db-name
user="sa"
password="secret"/>
Change document tag:
<document>
<entity
name="product"
pk="id"
query="select id,name from products"
deltaQuery="SELECT id FROM products WHERE updated_at > ${dataimporter.last_index_time}'::timestamp at time zone 'utc'">
deltaImportQuery="SELECT id,name from products WHERE id='${dataimporter.delta.id}'"
<field column="id" name="id"/>
<field column="name" name="name"/>
</entity>
</document>
Assuming that our DB named my-db-name
and we have table products
with columns id
, name
and updated_at
.
Column 'updated_at' of datetime type stores the date of last modification of the row. This column will be used in incremental import to track rows modified since the last import into Solr.
pk
attribute signals the primary key of db entity.query
attribute gives the data needed to populate fields of the Solr document in full-import.deltaQuery
attribute gives the primary keys of the current entity which have changes since the last index time.deltaImportQuery
attribute gives the data needed to populate fields when running a delta-import.Full-import command uses the query
query, delta-import command uses the delta...
components.
Notes:
dataimporter.last_index_time var
is automatically stored in $SOLR_HOME/my-core-name/conf/dataimport.properties
.deletedPkQuery
attribute in your as follows:deletedPkQuery="SELECT id FROM deleted_item WHERE deleted_at > '${dataimporter.last_index_time}'"
>
symbole with >
.Alternatively, you can also do delta imports solely via query
.
https://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport
<schema name="example" version="1.5">
<field name="_version_" type="long" indexed="true" stored="true"/>
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="name" type="string" indexed="true" stored="true" multiValued="false" />
Note:
Clean up conf files: the your project resources/solr/conf
folder can have less files. You may want to remove all dictionary type files. Though you also need to remove the references to it in managed-schema file.
corona.schema: You can use corona.schema to dynamically add fields to schema.
(require [corona.client :as client]
[corona.data-import :as data-import])
(def client-config
{:type :http
:host "127.0.0.1"
:port 8983
:path "/solr"
:core "my-core-name"})
(def client (client/create-client client-config))
(defn import-reset!
[]
(cmd/exec! :stop)
(println "SOLR: Starting Solr...")
(cmd/exec! :start)
(println "SOLR: Deleting :my-core-name core...")
(cmd/delete-core! :my-core-name)
(println "SOLR: Creating back :my-core-name core...")
(cmd/create-core! :my-core-name "resources/solr/my-core-name/conf")
(println "SOLR: Reloading import conf...")
(solr.data-import/reload-config! client-config)
(println "SOLR: Ready to import documents :-)"))
(import-reset!)
You can now perform full-import
(data-import/full-import! client-config {:clean true})
;; Run several times until completed:
(data-import/status client-config)
The full import loads all data every time, while incremental import means only adding the data that changed since the last indexing.
By default, full import starts with removal the existing index (parameter clean=true).
Then, add an entrie in your table and can perform delta-import
(data-import/delta-import! client-config {:clean false :commit true})
(data-import/status client-config)
Then, lastly copy the last id of the last entry you added and run a solr query!
(client/query client {:q "id:12345" :fl ["id"]})
;; change 12345 to your last id
Notes:
delta-import: Use clean=false while running delta-import command. debug=true - The debug mode limits the number of rows to 10 by default and it also forces indexing to be synchronous with the request.
Admin console: After successfully adding a collection (or core) to Solr you also can select it and run dataimport commands from Solr Admin. ** full-import: http://imgur.com/a/raVE4 ** delta-import: http://imgur.com/a/raVE4
Error logs: You can find error and info logs in $SOLR_HOME/server/logs/solr.log
Use SolrJ for less XML. THis can be a starting point: https://wiki.apache.org/solr/SolJava
Can you improve this documentation? These fine people already did:
Leon Talbot & Sanel ZukanEdit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close