SQL like capabilities for Apache Cassandra.
This library is born out of frustration of expecting SQL (kind of) queries to work against Cassandra, but they don't. The situation becomes worse when you have to update some data.
Recently, I was working on a data migration task, and due to some logical errors, I had to rollback the changes made to the data, but, sadly, this was not an RDBMS!
Hence, this library, which helps in executing those queries against a Cassandra database which are not possible using
plain CQL. Seemingly simple queries, e.g. a query with a like
clause or, a query where we have to update multiple
rows (which possibly, could've been achieved using a nested SQL query), are made possible using this library.
This library is written in Clojure, so the syntax is far from that of SQL, but it is minimal enough to be usable.
There are just a few namespaces in this library:
default-db-map
atom which stores Cassandra connection parametersBefore we can start executing any queries, we need to connect to a Cassandra instance. This can be done using:
(core/connect-to-default-db "localhost" "username" "password" "keyspace")
Once we are done, we can disconnect from Cassandra using:
(core/disconnect-from-default-db)
The tests in this library sqlforcql.schema
create the following two tables (with the data as shown):
with nickname as the partitioning key (hereafter, referred to as the PK column)
nickname(*) | city | country | first_name | last_name | zip |
---|---|---|---|---|---|
fedex | Bern | Switzerland | Roger | Federer | 3001 |
naseer | Ajmer | India | Naseeruddin | Shah | 305001 |
sonu | Dubai | UAE | Sonu | Nigam | 00000 |
rafa | Madrid | Spain | Rafael | Nadal | 28001 |
king | Abu Dhabi | UAE | Shahrukh | Khan | 00000 |
chintu | Jodhpur | India | Rishi | Kapoor | 305001 |
with city and country as the partitioning keys, so country becomes the clustering column (hereafter, referred to as the CK column)
city(*) | country(*) | first_name | last_name | nickname | zip |
---|---|---|---|---|---|
Jodhpur | India | Rishi | Kapoor | chintu | 305001 |
Bern | Switzerland | Roger | Federer | fedex | 3001 |
Abu Dhabi | UAE | Shahrukh | Khan | king | 00000 |
Ajmer | India | Naseeruddin | Shah | naseer | 305001 |
Dubai | UAE | Sonu | Nigam | sonu | 00000 |
Madrid | Spain | Rafael | Nadal | rafa | 28001 |
select * from players; - (cql/get-all "players")
select count(*) from players; - (cql/get-count "players")
select * from players where city = 'Ajmer' allow filtering; (cql/get-by-non-pk-col "players" {:city "Ajmer"})
select * from players where nickname = 'fedex'; (cql/get-by-pk-col "players" {:nickname "fedex"})
(cql/get-by-pk-col "players_by_city" {:city "Jodhpur" :country "India"})
Suppose we wanted to execute a SQLish query with a like clause:
(cql/get-by-non-pk-col-like "players" {:city "Dhabi"})
Or, suppose we wanted to update multiple rows based on a criteria involving some non-PK column:
(cql/update-by-non-pk-col "players" :nickname {:city "X"} {:city "Y"})
A query similar to the above update query against a table having both PK and CK columns, we have to use:
(cql/update-by-non-pk-col-with-clustering-col "players_by_city" [:city :country] {:zip 305001} {:zip 411038})
Typically, when we are using Cassandra, it is very common to have a base table, and a supporting query table similar to
the players
& players_by_city
tables in our example above. In older Cassandra versions, there was support for
materialized views where we did not have to worry about keeping the number of rows in the base table and its
materialized view, in sync. However, current Cassandra versions have dropped support for materialized views, so it's now
our responsibility to keep the rows in the base table, and the supporting query tables in sync (i.e. the data in these
related tables should be the same). This is where the analyze
and the generate
namespaces help.
(analyze/get-counts ["players" "players_by_city"])
(analyze/get-diff "players" "players_by_city")
NOTE: If the rows are the same, then this above function returns: {:no-difference #{}}
(generate/get-insert-statements table-name table-rows)
analyze/get-diff
can be fed to the above function if the rows aren't the same:(generate/get-insert-statements (analyze/get-diff "players" "players_by_city"))
Copyright © 2020 FIXME
This program and the accompanying materials are made available under the terms of the Eclipse Public License 2.0 which is available at http://www.eclipse.org/legal/epl-2.0.
This Source Code may also be made available under the following Secondary Licenses when the conditions for such availability set forth in the Eclipse Public License, v. 2.0 are satisfied: GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) any later version, with the GNU Classpath Exception which is available at https://www.gnu.org/software/classpath/license.html.
Can you improve this documentation? These fine people already did:
Manoj Waikar & mmwaikarEdit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close